Want to take part in these discussions? Sign in if you have an account, or apply for one below
Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.
My calculus book uses many different notations for the derivative of y=f(x) with respect to x, such as
dydxy′f′(x)Df(x)dfdxRecently I’ve found that I kind of object to a couple of these. For instance, consider dfdx. One of the things I try to teach my students is that when we define a function f by writing f(x)=x2, say, the variable x is a dummy variable; if we wrote f(t)=t2 we would be defining the same function, namely the one which squares its input. But if “f” denotes a function of this (usual mathematical) sort, then how can we write dfdx to mean its derivative, since f doesn’t know that we called its input variable x?
Notations like f′(x) and Df(x) don’t have this problem, because f′ and Df denote the derivative function of f, which assigns to each input value the derivative of f at that value, and so f′(x) and Df(x) just mean evaluation of this function at that value. I suppose we could interpret dfdx similarly if we regarded “dfd” as the derivative function of f, which we evaluate at something by placing it after the d in the denominator, but this seems strained, and would suggest even odder notations such as writing dfd3 for f′(3).
It was pointed out to me that this kind of notation is even commoner in multivariable calculus, where we write things like ∂f∂x and ∂f∂y, and in this case there aren’t good alternatives available, since we have to indicate somehow whether it is the first or second input variable of f with respect to which we take the derivative.
From a differential-geometric viewpoint, one answer is to say that x denotes a standard coordinate function on the 1-dimensional manifold that is the domain of f, and so we are actually taking the derivative with respect to a vector field associated to that function. We can even regard dfdx as a literal quotient of differential 1-forms, since 1-forms on a 1-manifold are a 1-dimensional vector space at each point, so the quotient of two of them is a real number. But while logically consistent, this seems to undercut the force of the lesson of dummy variables, since we are endowing x with a special status not shared by t.
Using x to denote the coordinate function has the other interesting consequence that it makes it okay to say “the function x2+1” (since we can multiply and add functions together), instead of insisting on saying “the function f defined by f(x)=x2+1”. Again this feels like it undercuts the lesson of what a function is — and yet I find that it’s hard to teach a calculus class without eventually slipping into saying “the function x2+1”. With x as a function we can also write “f=x2+1”, which again is something that I’m used to indoctrinating my students against.
I also have a problem with the notation y′, for a more pragmatic reason. Suppose we want to take the derivative of y=(3x+1)4 using the chain rule. A nice way to do it is to make a substitution u=3x+1, so that y=u4, and then use differentials:
du=3dx dy=4u3du=4(3x+1)3(2dx) dydx=8(3x+1)3The problem here is that the notation y′ doesn’t indicate what variable we differentiate with respect to, and in this calculation we have two derivatives of y, namely dydx=8(3x+1)3 and dydu=4u3, which are not equal even after substituting the value of u=3x+1. Here the solution seems to be straightforward: just don’t write y′. But if we can write f=x2+1 just like y=x2+1, and if we allow the notation f′(x)=2x and hence also f′=2x, then we should just as well have y′=2x.
Does anyone have a good solution? I feel like at least part of the problem comes from confusing ℝ as the real numbers with ℝ as a 1-dimensional manifold, but I haven’t exactly managed to pin down yet how to solve it from that point of view.
This doesn’t answer the broader question, but there is a “better” alternative for the multivariable calculus notation. It is pretty typical to use fy(x,y) to denote the partial derivative of f with respect to y.
For what it’s worth, Mathematica writes multivariable derivatives as f(i,j,…,k), which means the i-th partial derivative with respect to the first variable, the j-th partial derivative with respect to the second variable, etc. Unfortunately, this notation presupposes the commutativity of partial derivatives.
On f′ and dydx: I think you are right there. In order to use notations like f′ consistently, it is appears to be necessary to abandon notions like “change of variable” and regard x↦(3x+1)4 and u↦u4 as distinct functions. So it is incompatible with the setup where calculus expressions are regarded as functions on some manifold – because there are no preferred coordinates. From a syntactic point of view it does seem rather disturbing that the x in the denominator of dydx is bound but free in dydx itself. It’s almost as if ddx is some kind of variable binding operator like λ or ∏ or ∑… except for the fact that it doesn’t bind the variable at all! Compare:
x:ℝ⊢y≡(3x+1)4:ℝ x:ℝ⊢dydx≡12(3x+1)3:ℝ ⊢λx.(3x+1)4:ℝ→ℝAccordingly, we should also require the use of substitutions instead of evaluations when working with the dydx notation: so dydx|x=0 instead of dydx(0).
What would elementary calculus look like in cohesive homotopy type theory?
David,
with cohesion one can essentially characterize d:ℝ⟶Ω1cl such that postcomposition of a function f:X⟶ℝ with this map yields the derivative df:∈Ω1cl(X) of f. This is disucssed in some detail at geometry of physics in the section 4. Differentiation
A variant is a certain homotopy pullback of this construction, which yields variational calculus, as discussed there in the section In terms of smooth spaces.
In differential cohesion one can get hold of the infinitesimal interval D and then proceed as in synthetic differential geometry. For instance using this one can describe differential equations as discussed there in the section In terms of synthetic differential equations.
Moreover, differential cohesion encodes D-geometry and hence in principle allows to talk about differential equations in that way.
I have added my previous reply as a paragraph to differential calculus – In cohesive homotopy theory
Thanks. I meant my question to see if perhaps in some idealised setting where Mike has complete control over his students’ maths education, so has taught them HoTT from an early age, now when he comes to teach calculus, and he adds the cohesive axioms, is he ever confronted with the issues he raised in #1?
He has f:R→R, with f(x)=(3x+1)4. So f=g(h) for obvious g and h. So then using your d,
df=dg(h),at which point a chain rule kicks in. (I’m waiting for geometry of physics to load to read Prop. 26, but it takes about 10 minutes to typeset on this machine!)
So all of Mike’s worries are over!
I’m waiting for geometry of physics to load to read Prop. 26, but it takes about 10 minutes to typeset on this machine!
Oh, that’s a pain. For me it’s slow, but not quite this slow.
This is with math rendered by Mathjax, I suppose? I suppose in some other browser maybe and/or with the rewquite fonts installed, it should take no extra time?
Once there was this vision that we have decent math on the web. But somehow it still seems to be a long, long way to go, for some reason.
So all of Mike’s worries are over!
While I suppose you are joking (right? :-) this reminds me that maybe we shouldn’t hijack Mike’s thread too much.
My reply to the topic here would be: since we are humans and not proof assistants, whenever we actually do some work we’ll adopt convenient “abuse of notation”. One should alert students as to what’s really going on, but I wouldn’t worry too much about enforcing a formally consistent notation.
It wasn’t completely a joke. There was something like the thought that if cohesive HoTT is the God-given way to do things, then it might suggest the least bad forms of “abuse of notation”.
@hilbertthm60: I don’t understand how fy(x,y) is better. You still have the variable y occurring in the notation fy for the partial derivative function. Am I misunderstanding?
@Urs: It’s true that we always abuse notation, but I find that the correct and incorrect ways to abuse notation are one of the hardest things for beginning math students to understand. I haven’t spent a lot of time thinking about it, but I’ve generally assumed that it’s not really possible to understand how to abuse notation until you understand “the way mathematics works” at a sufficiently deep level, so that when teaching students who don’t yet understand math, it’s better to try to avoid abusing notation as much as possible.
From the perspective of Logic Programming, and a particular variant, “variables” are a notational solution that conflates 2 things: labeling and binding.
Traditionally, formula are written as linear strings of symbols that are parsed into trees. One use of variables is as external labels to note that parts of a “tree” share the same substructure and it is really not a tree.
For example in the formula x+x the two xs can be seen as the same substructure and substituting “1” for “x” involves changing just one substructure, not 2. The result of the substitution, “1 + 1”, has dropped any labeling that indicates that the two “1”s are really the same and came from the same place. One can explicitly use external labels for this situation using a name followed by “?”. Then the two formulas would be notated as x?⊤+x?⊤ and x?1+x?1 where ⊤ indicates that the first x? structure is “unbound”.
Binding traditionally has two states - unbound or bound to something totally specific. The alternative perspective is that “binding” is a continuum that takes place in a lattice of structures where ⊤ means “completely unknown” and ⊥ means “totally contradictory”.
In an expression like x+x, x is rarely completely unknown. Usually x is known to be some specific type of number, but which exact number is unknown. One could even give x an intermediate binding such as 1∨2. Evaluating x?(1∨2)+x? gives the result 2∨4 while if the structure is not shared, evaluating x?(1∨2)+y?(1∨2)? gives 2∨3∨4.
There are two notions of substitution. “Binding substitution” or “unification” is used to make structures more specific. For example (x?⊤+x?⊤)∧(1+⊤) is unified to x?1+x?1 while (x?⊤+x?⊤)∧(1+2) becomes x?⊥+x?⊥ because 1∧2 becomes the contradictory structure.
Actual true substitution is rare in maths. Rarely does one substitute 2 for 1 in an expression like 1+1 to give 2+2 though there can be systems where some structure holding 1 is regarded as a default value that can be overridden. The result of substituting 2 for 1 in 1+1 depends on whether the two 1s are the same structure or not - 1 is not substituted for, instead a maybe shared structure bound to 1 gets rebound.
Rod, does any of that suggest an answer to my question?
Rod, does any of that suggest an answer to my question?
It does to your question “What is a variable” :)
But to your more elaborate question it seemed like you were complaining about “variables” that played the role of something like labels, flags, or indices into structures without being “true variables” that can get bound. I just wanted to mention an alternative way of thinking about variables, binding, and substitution which might fit your problem if I understood the math enough, however I got a little carried away in my typing.
And for those who ares becoming Hegelian, I wanted to sneak in the opposition between “totally contradictory” and “completely unknown” :), though it doesn’t seem to involve adjoints.
I’m sorry.
My question wasn’t really “what is a variable”; that was just the best short-ish title for this kind of rambly question that I could think of. I don’t think I was complaining about “variables that can’t be bound”, but maybe I was; can you explain further? What does your description have to say about notations for derivatives?
Hello Mike,
you seem to suggest that these issues are resolved if we interpret x as a function instead of a variable, a point of view I sympathize with. Unfortunately I don’t quite understand the arguments against this step. You seem to mention two:
But while logically consistent, this seems to undercut the force of the lesson of dummy variables, since we are endowing x with a special status not shared by t.
and
With x as a function we can also write “f=x2+1”, which again is something that I’m used to indoctrinating my students against.
Correct me if I’m wrong, both of these arguments disappear if we interpret x as a function: in the first case since the function t might (or might not) be the same as the function x. (This seems in agreement with applications of calculus where different variables like time t and position x are not interchangeable.) The second argument is resolved since for a function y=f(x) all of the symbols y, f(x), f and y(x) now denote the same thing. The notations f(x) would be a different notation for composition f∘x (with x usually denoting the identity function).
So maybe there are other arguments against this convention?
Another question: if y and x only denote variables, should the symbol dydx denote a variable or a function?
Edit (after reading your question again): you also suggest, that the “evil” notation is the primed one y′, which I agree with. Physicist have a convention of writing the dot but only to denote derivatives with respect to time, so that doesn’t have the same ambiguity. Also if Df denote the differential df as Todd suggests, than that’s Ok since it has a different meaning from dfdx
Cheers Michael
Sorry to enter this discussion so late. But Mike: it sort of looks like you’ve answered your own question back in #1! Is there a problem with just using Df notation?
It’s probably hard to tease apart for calculus students what is really going on, but if we start with a function f:ℝ→ℝ and apply the tangent bundle functor (−)D from SDG, keeping in mind the Kock-Lawvere axiom asserting an isomorphism (pos,vel):ℝD→ℝ×ℝ, where the first map is evaluation at the point 1→D. Then notationally it seems quite alright to describe fD in terms of a mapping that is customarily denoted as
(p,v)↦(f(p),Df|p⋅v)and this carries over just fine to the multivariate setting. Maybe it’s even a really good idea to implant the thought that the derivative Df|p (or Df(p) if you’d prefer) is not really just to be thought of as just a “number” but as a linear function (which can be characterized by a number in the 1-dimensional setting, by setting v=1 and taking Df(p)⋅1)?
If this pedagogical point is to be driven home seriously, I suppose you could dismiss notations like dfdx as archaisms, reflecting an earlier age when notions of functions, variables, etc. hadn’t been fully worked out.
Is what Todd denotes with Df the same as df the differential of the function f?
Michael_B: as far as I’ve always seen it used, if f:M→N is a smooth mapping and p∈M, then Df(p):Tp(M)→Tf(p)(N) is the derivative mapping at p, also sometimes called the Jacobian at p. If f is real-valued, then all the tangent spaces Tf(p)(ℝ) are canonically identified with ℝ, and the resultant linear functional Df(p):Tp(M)→ℝ is an element in the cotangent space of M at p, also denoted df(p) as you say.
@Michael, you seem to be saying that I should just give up on indoctrinating my students not to write f=x2+1, and on teaching them that f(x)=x2 and f(t)=t2 define the same function? I think there is an important point to distinguish between the values of a function and the function as an object in its own right. Only once you really understand that are you justified in failing to notate it.
Hello Mike. I guess that’s what I’m saying :) Admittedly, I’m not sure it’s a valid solution to the problem, neither mathematically nor pedagogically. But it seems to me that it corresponds more to the common practice in calculus.
Already a statement like y=f(x) is mostly read as “y is a function of x”, and less often as “y is the value of the function f at input x”. (At least among engineers and physicists). Or I have seen statements like “suppose V(t) is the volume of water at time t” together with a plot where the vertical axis is denoted with V. Without introducing an additional variable to denote the value of the function V.
Or consider when calculus textbooks discuss cartesian coordinates (x,y) and polar coordinates (r,θ) in the plane. The easiest way for me to think about this is by interpreting x,y,r,θ as functions on the plane. Wouldn’t it be cumbersome to introduce additional names to denote the functions of which x,y,r,θ are the values?
Unfortunately I don’t yet see clearly what trouble such a point of view causes. (let’s restrict to mathematical problems since the pedagogical ones are hard to predict)
Probably we do have to distinguish between the function and its value sometimes, but can’t we still do this say by obtaining the value of x at 3 by precomposing with the constant function 3?
I agree entirely that it doesn’t cause any problems mathematically. My point is entirely a pedagogical one.
Like you, Mike, I try to teach my Calculus (and Algebra) students the difference between a function and its value, even though the textbooks do what they can to undermine my efforts. Of course, it is important to see the abuses of notation that they're liable to meet in applied fields, but (as you say) they have to understand the correct way first. So I tell them that the textbook abuses notation, but I try to never abuse it myself. In particular, I tell them that y′ is ambiguous, so I don't use it; they are allowed to (since the book does) but I recommend against it. I use f′(x) and dy/dx, as you suggest, instead. (Besides that, Df(x) is all right, too; but since our book doesn't use it, I only mention it in passing.)
As for df/dx, this is a bit subtle; df(x)/dx is just fine; in fact, if y=f(x), then y=df(x)/dx=f′(x). But that doesn't make df/dx a legitimate synonym of f′! It's a matter of logic; like Rod #12 said, the two xs stand for the same thing, so you can't substitute for one without substituting for the other. Even this is subtle; if you substitute, say, 3 for x in f′(x)=df(x)/dx, then you get f′(3)=df(3)/d3, which doesn't quite work. On the other hand, if you substitute t for x instead (a change of variable), then f′(t)=df(t)/dt is fine. This works better with differentials; df(x)=f′(x)dx becomes df(3)=f′(3)d(3), which is fairly trivial but at least correct. The problem is that, in writing df(x)/dx, you're tacitly assuming that dx is nonzero, that is that x is a variable quantity1. (This is the origin of the term ‘variable’, I believe, even though we now use that also for symbols that stand for constants.) So you should only substitute something variable for it. (But in df(x)=f′(x)dx, no such assumption is being made.) This is really no more mysterious than why you can't substitute 1 for x in (x2−1)/(x−1).
What does it mean for x to stand for a variable quantity? Doesn't it stand for a real number? Yes … but it stands for a variable real number. This brings us to the question in the title: what is a variable? Lawvere said that a variable can be any morphism in any category; then a variable real number (aka a real-valued variable or simply a real variable) is a morphism whose target is the space2 of real numbers. For some reason, the only field in which this penetrates the undergraduate curriculum is statistics; there, they know that a random variable is a measurable function on a measurable space (typically valued in the measurable space of real numbers with Borel measure), a morphism in the category of measurable spaces and measurable functions (or something like it). Even in an elementary treatment where every random variable is defined a finite space and the word ‘measurable’ is never uttered, they still give a definition of ‘random variable’. In Calculus, we usually study smooth variables (or smoothly varying quantities), which are morphisms in the category of smooth manifolds and smooth functions (or something like it). This is actually the first lesson in my Applied Calculus course: what a smooth variable is (very roughly, of course). (In the regular Calculus course, we don't usually assume that everything is smooth, so I have to bring this in later.)
So x, y, t, u, etc are all variables (usually smooth ones). What then is f? It's a function, but I mean ‘function’ in the sense used in elementary algebra, that is a partial function (a partial morphism in the category of sets) from ℝ to ℝ, usually a smooth one (so infinitely differentiable wherever defined). Quantities like f(x) are defined at the formal level by composition; we don't write this as f∘x because we conceptually distinguish variables (with arbitrary unspecified domain) from functions (with domain a specified subset of ℝ). So (pace Michael #21) x, y, r, and θ may indeed be functions on the plane, but we're not treating them in the same way as f, so they use different notation. (And then they might not be functions on the plane; if you're really studying the motion of a particle in the plane, they might be better thought of as functions of time.)
I won't even get into the problems with notation for second derivatives and multivariable calculus. In general, all of this stuff works better with differentials than with derivatives (right down to the terms ‘differential’ and ‘derivative’), but I've said enough for now.
@Toby: Interesting comments.
Just a few questions
1) How do you bring the point across to students that sometimes x stands for the value of a function (or constant, as in your first paragraph) and sometimes x represents a variable quantity (morphism in a category) as in your paragraphs 3 and 4? Especially if we use the same symbol.
2) Why is it important to make a conceptual distinction between functions with unspecified domains (like x,t,r) and functions with domains subset of ℝn like f, and use different notation for their compositions?
Another remark: form the perspective of interpreting x,y,t etc. as morphism in a suitable category, a common abuse of notation I see, is that a lot of times the pullback of a function along a map gets denoted with the same symbol. So for example when describing motion of a particle in the plane by saying x,y are functions of time, I’d interpret this as saying x,y are actually the pullbacks of the standard coordinates x,y on the plane to the 1-dimensional manifold representing time, along the map given by the motion.
I think I have a partial answer to 2): functions with values in ℝ but unspecified domains cannot be composed, but we may always compose them with functions from matbbR to itself. Is that the reason?
Thanks Toby! I think you’ve resolved my confusion by saying that the manifold in question should be an arbitrary one (the domain of generalized elements), not even necessarily 1-dimensional. I hadn’t thought about df(x)/dx, but you’re right that that makes perfect sense.
My explanation of substitution would have been a bit different. I would say that when x is a variable, then dx is a new variable, albeit one which happens to be related to x in a certain way. (In other words, we now consider generalized elements of the tangent bundle Tℝ rather than the manifold ℝ.) So substituting 3 for x doesn’t make dx into d(3). Rather, it just means that instead of dx being a small variation about x, it is a small variation about 3.
I would really like to hear what you have to say about second derivatives and multivariable calculus, if you have time sometime to share it. I have recently discovered another problem with the f′ notation: prime is a very small symbol and hard to see from the back of the room! (-:
The second derivative is not really a notion that goes well with differential forms. After all, d2=0! In the one-variable case one can cheat and observe that the tangent bundle is generated by a global section dx, so there is a unique function dfdx such that df=dfdxdx, and since dfdx is a function, we can talk about its derivative again. (Note, we can do this equally well on the real line or the circle!) This cheat can be made to work in the many-variable case provided the tangent bundle is trivial, but it’s far more obvious that one is doing some very coordinate-dependent things then. (One thing that is always very confusing is that the partial differential operator ∂∂xi depends on the choice of the other coordinates as well, despite the notation! This is in contrast to the differential 1-form dxi, which depends only on the coordinate function xi.)
I think one is supposed to think about jet bundles if one wants to do higher-order derivatives. But that seems a step too far for first-year calculus.
Certainly jet bundles are themselves a step too far for first-year calculus, but then so are tangent bundles. The trick would be to find a way to have them in the background causing things to make sense, but not needing to be mentioned explicitly.
The most direct thing to see is the iterated tangent bundle: if dx is a variable element of Tℝ, then d(dx) is a variable element of T(Tℝ). But unfortunately that is a bit bigger than the jet bundle…
I think that the next time I teach calculus, I’m going to bring in differentials and linear approximations much earlier. This time I waited to use them as an explanation for the chain rule, but then once we had them I found that I liked using them for everything else. (Thanks Toby for stressing this point, here and elsewhere!) From that point of view, the ordinary differential is determined by a linear approximation
f(x+dx)≃f(x)+d(f(x))to first order in dx, so an appropriate meaning of “second differential” would be a quadratic approximation
f(x+dx)≃f(x)+d(f(x))+12!d2(f(x))to second order in dx. And with this meaning of d2(f(x)) we do have a literal quotient
f″I haven’t yet worked out the best way to explain “first order in ”, though. And I don’t know what I would say to explain the , either.
Zhen Lin #27 has succinctly pointed out the problems.
Here is a more explicit problem with the usual notation for second derivatives: it breaks the chain rule. If and , then
works out fine; it gives
as it should. But (using the proposed second differential from Mike #29, which is implicitly endorsed by the notation )
is no good; it gives
which is incorrect. The correct formula is
For this reason I never write in class (except once, about the same time that I write , to warn against it); I write (or even , when it comes naturally) instead.
More simply (but you might not believe it if I make it so simple right away), implies , which implies , and that's not what we want at all.
@ Michael #24:
Why is it important to make a conceptual distinction between functions with unspecified domains (like ) and functions with domains subset of like , and use different notation for their compositions?
Partly because function notation is so convenient, yet it requires a domain, and sometimes we don't want to specify it (or even to imply that it's a subset of , much less a subset of in first-term Calculus). So I want to write and ; of course, I can also write and … but I can also write , while I can't write . I guess that this is basically what you said, the ability to compose functions.
The distinction is especially relevant in applications. Here, the domain of the variables is a vaguely unspecified space of states of the world. The textbooks sometimes encourage us to identify this space (often it can be identified with the time line, for example), so that there is a single independent variable (or a few in the multivariable case) of which every other variable is a function. But the whole point of the Chain Rule is that the independent variable is irrelevant! It's sufficient that some choice of independent variables is possible (that is that some space of states can be assumed to exist), but it's completely unnecessary to actually make this choice. So I don't actually want to say that etc are functions at all (to the students, for whom a function is defined on a subset of some ). Yet functions like are also around, and I want to refer to them from time to time too.
How do you bring the point across to students that sometimes stands for the value of a function (or constant, as in your first paragraph) and sometimes represents a variable quantity (morphism in a category) as in your paragraphs 3 and 4? Especially if we use the same symbol.
What's happening at the most basic level is a change of context (in the technical sense as in type theory). There is the general context where the variables are allowed to vary as much as they may, and then there is the more specific context where is set to (say) . (There are other intermediate contexts, especially in multivariable Calculus, such as that given by a constraint as in optimization problems.) Assuming for the sake of argument (but this is hardly necessary) that there is exactly one possible state of the world in which , then we have a morphism from the point to the space of all world-states; as you said (but I didn't quote), we are taking a pullback along this morphism and abusing notation by keeping the same symbol .
I tell my students, particularly when working out word problems, to keep careful track of the context. (I use the word ‘context’ but don't let them suspect that it's a technical term in logic!) In a typical problem, they have an equation that holds always, which they may differentiate; but then later they use equations that are only true for an instant. (Related rates and optimization are two broad categories of problems like this.) I tell them that any result from the equations that hold always is also true for an instant, but not conversely (which I think makes intuitive sense); and you cannot differentiate equations that only hold for an instant, because nothing is changing in that instant! (In multivariable Calculus you can differentiate equations that hold under a constraint, and this leads for example to Lagrange multipliers, but you still have to remember that you are working relative to a constraint.) So basically, I'm allowing them to pull back results along a morphism but not push them forward; but I try to make it sound like common sense instead of a theorem of categorial logic!
the partial differential operator depends on the choice of the other coordinates as well, despite the notation!
In a thermodynamics course, I learnt the notation for the partial derivative of with respect to when is held constant. (In general, there are subscripts.) In my multivariable class, I introduce this notation first, then say that we can drop the subscripts as an abuse of notation when it's obvious what they're going to be. Of course, all of the abuses of notation can be justified in this way (that it's obvious what it's supposed to mean), but only this one is really necessary, since otherwise it gets very tedious.
By the way, Mike, the corresponding notation for the partial derivatives of a function is (where ); it works just like in hilbertthm90 #2, only it's legitimate. An alternative is (especially if you want to leave subscripts free for a sequence or other family of functions). This extends to higher-order derivatives just fine, without having to assume commutativity (the claim that ).
@Toby #30: very interesting, thanks! How about this for a second try at “second differentials”?
The general form of a second-order approximation should include not only a first-order change in the variable but also an independent second-order change. That is, let be a first-order infinitesimal and a second-order one, and work up to second-order; thus is relevant but and and can be neglected as being third- or fourth-order (or are equal to zero, depending on your preferred flavor of infinitesimal). Thus while it is true that
we also have (by the same reasoning)
and it is both the third and the fourth terms here that should be called , since they are both second-order. That is, we write
and therefore
Now if and , we have
And matching this with
we recover the correct chain rules for both first and second derivatives.
I suspect that this could be written in terms of jet bundles.
Of course, now we’ve lost the notation for the second derivative. Unless there’s some reason why we can assume .
@Toby: thanks for the detailed answer!
Concerning Mikes nice reasoning in #33:
Of course, now we’ve lost the notation for the second derivative.
If I understood this correctly, you might still “save” it by using the notation , or should it be ?
@Michael: I didn’t use the notation ; what are you thinking that it would mean? The problem that I saw is that is not a linear function of alone, but also of .
@Toby, I have a couple of terminological questions for you.
1) What do you call “”? In Lawvere’s parlance it is still a “variable quantity”, but as it is not syntactically a variable I wouldn’t want to call it that. But neither is it a “function” in your setup, if I understood correctly.
2) What do you call an object like “” which you can take the integral of?
.
I agree; I first got this by writing and applying the product rule. Notice that the here is not the exterior derivative but instead a commutative (rather than supercommutative) operator.
Of course, now we’ve lost the notation for the second derivative.
Right, and I don't know how to rehabilitate that notation, because of my last remark in #30.
However, you can write instead! This is because, just as is the coefficient on in an expansion of , so is the coefficient on in an expansion of . (ETA: Michael already noticed this in #34, but hopefully my explanation of it helps.)
By the way, I also tell my students that binds more tightly than any non-differential operation like squaring, so I can write instead of . (Then I always use parentheses in something like .)
Answering Mike's questions in #36:
I call simply a quantity. I might even call it a real number (but not a constant one) or even simply a number, but ‘quantity’ usually works well (except in applications to supply and demand, where ‘quantity’ has a more specific meaning). It is a variable quantity, of course, and I might even point out that it varies or may even say ‘ is variable.’, but not ‘ is a variable.’; that would be confusing. (In other words, when describing the quantity as a whole, I would use ‘variable’ only as an adjective.) Also, I will say that is a function of ,1 which simply means that there exists a function such that . If I'm emphasizing the logical form, then I may also call it an algebraic expression; technically, the expression ‘’ represents or stands for the quantity .
Sometimes I call an infinitesimal quantity, if I want to emphasize its interpretation as something infinitely small (and similarly I might call a finitesimal quantity). But if I'm talking about things that one can integrate, then I usually call it a differential form. I actually introduce that term fairly early, when I remark that the differential of any (finitesimal) expression (in any number of variables!) will be a differential form; I point out that every term has a differential as one factor, note that this makes every term (and hence the sum) an infinitesimal quantity, and then I introduce the name for expressions of this form. (I also remark that the differential form has rank because only factor of each term is a differential, but we don't have to say that since differential forms of higher rank are only used in multivariable Calculus. And then in my multivariable class, I use them!)
In case your browser fails to render <ins> in combination with MathML (as mine does), the ‘’ should also be underlined (assuming that your browser renders <ins> as underlining, which they nearly all do). This is just for emphasis. ↩
Toby in #37 wrote:
I first got this by writing and applying the product rule. Notice that the here is not the exterior derivative but instead a commutative (rather than supercommutative) operator.
Interesting! Would you mind telling a bit more about this as a commutative operator and how the equation follows from the chain rule. Studying synthetic differential geometry is still on my TODO list, so apologies if this is standard knowledge among experts.
I also still need to understand Mikes computation in #33. Intuitively I would have thought that a first order infinitesimal is also an infinitesimal of second order, so I find it confusing that we need to include the first order change seperately when looking at the effect of a second order change. (And I also have no intuition for what it means that the two changes are independent, from a geometric or physical perspective).
Edit: I’m also still curious if Mikes suggestion from #26 can be made consistent with the different interpretations of suggested above
So substituting 3 for doesn’t make into . Rather, it just means that instead of being a small variation about , it is a small variation about 3.
So would it then be correct to also write ?
Intuitively I would have thought that a first order infinitesimal is also an infinitesimal of second order
Actually, it’s the other way around: a second order infinitesimal is also a first order one (although to first order, it’s zero). Higher order means a smaller number.
I first got this by writing and applying the product rule. Notice that the here is not the exterior derivative but instead a commutative (rather than supercommutative) operator.
Interesting! Would you mind telling a bit more about this as a commutative operator and how the equation follows from the chain rule.
I don't know very much about that operator; I asked about this stuff once on Math Overflow and got no clear answer, although I did get a reference that I haven't followed up yet. But if I just assume that it continues to obey the usual rules, then I can calculate with it just fine. In this case:
So would it then be correct to also write ?
Apparently so! But of course is simpler.
@Mike #40:
Actually, it’s the other way around: a second order infinitesimal is also a first order one (although to first order, it’s zero). Higher order means a smaller number.
Here’s how I was thinking: a first order infinitesimal number is one with , a second order is one with . So first order is also of second order. Actually when I picture infinitesimal neighborhoods of a point or subset in a manifold or whatever space, I always thought that they increase as the order increases. Did I get it wrong or are we maybe talking of dual things?
@Toby #41: I recall that question on Mathoverflow (one of the comments was mine). Unfortunately I also have not had the time to follow up on the references. But I do find the infinitesimals and differentials approach advocated by Dray Manogue to be worthwile for teaching calculus. And since Mike arrived at the same equation as you by slightly different reasons it makes it even more compelling to believe that maybe there is still something to be understood in the interpretation of or .
Yes, I think we are using language in dual ways. I’m thinking of nonstandard-analysis-style infinitesimals, whose square or cube is never actually equal to zero. Instead I’m saying, let’s fix some particular “scale” infinitesimal ; then a first-order infinitesimal is one such that is finite (“limited”), a second-order one is such that is finite, etc.
Then when we work “up to first order”, which we could formalize as being in the quotient ring of limited numbers modulo , the square of a first-order infinitesimal can be neglected. And when we work “up to second order”, i.e. in the limited numbers modulo , the cube of a first-order infinitesimal and the square of a second-order one can be neglected. So in the latter quotient ring, a first-order has while a second-order one has .
I think that this matches the use of phrases like “to first order” and “a first order change” in ordinary (non-infinitesimal) language better. A second order change is negligible if we are working to first order, but not if we are working to second order, yet the amount of the change itself is the same in both cases; what changes is our attitude towards it. But I guess it doesn’t apply as well to SDG-style nilpotent infinitesimals, so with those it may be better to avoid terms “first order” and “second order” and talk instead about “nilsquare” and “nilcube” etc.
Mike thanks for the explanation! I’ll have to think about it some more to resolve the conflicting views in my head.
In the meantime, here is something slightly related to the original question. In my calculus class last week I asked the students to answer the following questions
Compute the derivative of:
with respect to
with respect to
with respect to
(the last one is an indefinite integral, I’m using the notations of my calculus book here (Hughes-Hallett))
That caused a lot of confusion for my students. My preliminary reaction is to think of the notation for the indefinite integral as the bad guy.
I wouldn’t ask my students (2) or (3). In fact, I’m not sure what you were expecting.
In (2), are you assuming that is a function of ? Or constant with respect to ? Were you hoping that they would write ? While there’s technically no contradiction in using the same variable both free and bound, it’s bad style even in published mathematical papers, so I wouldn’t want to inflict it on calculus students.
As for (3), the way I’m used to thinking of it is not a function but a class of functions (differing by local constants) — hence not something you can take the derivative of.
However, you do raise an important point, which is that is bound in but (sort of) free in . I’m curious to hear Toby’s take. I introduced indefinite integrals to my class last week by saying that the indefinite integral of a thing (I didn’t say “differential form”, but I might have as Toby suggested) is the most general expression whose differential is that thing. That made perfect sense to me.
I haven’t done definite integrals yet, but from that point of view, maybe the problem is with the definite integral notation, since we have the limits and specified but without indicating in the notation which variable is supposed to take on those values. For instance, in a chain rule / substitution problem, say we have , which we can solve by letting so that and
But this equality (of differential forms) is not something to which we can apply the “operation” and get
Instead we have to put and into and get
So maybe it would be better to write (as we do with summation notation, ) so that we could have
In (2), are you assuming that is a function of ? Or constant with respect to ?
Good point. But before I answers (and let me know me if you see this differently): the discussion so far showed that there are at least two popular interpretations for “variables” in calculus: one in the sense of “dummy variables” or placeholders for numbers, and one in the sense of “variable quantity” or maybe morphism in a suitable category. It also seems that these two interpretations can lead to conflicts. But I’d be glad to understand this better still.
Having said that:
For 2) I was expecting that they answer . From the “dummy variable” perspective is a placeholder for a number (representing the upper boundary and otherwise not related to ), and since the variable is bound the whole integral does not “change” when we plug in different values for , so the derivative is zero.
From the “variable quantity” perspective might depend on so the correct answer would be as you suggest . So to be consistent with the previous we need to assume is constant with respect to .
But some things are not yet clear to me about this last answer. I’ll come to it in a moment.
As for 3) I was expecting that they answer . I also think of the indefinite integral as a family of functions (depending on the same variable as the differential form). I guess I’m using the convention here that taking the derivative of a family of functions means taking the derivative of each member of the family. Of course in principle the additive constant could still depend on in some context, which makes things more subtle.
…maybe the problem is with the definite integral notation, since we have the limits and specified but without indicating in the notation which variable is supposed to take on those values.
I think the standard convention here is that the boundaries of the definite integral always refer to the variable appearing in (or etc.) so there is seldom ambiguity there. But as you suggest I also emphasize this by writing instead of . In fact I sometimes overemphasize by writing , which brings me back to 2).
If I had written and interpret variables as “variable quantities”, then how should I interpret the equality appearing in the upper boundary? Does it mean that and are the same variable quantities? In that case the answer to 2) would be the same as the answer to 3) and it wouldn’t be possible to ask if is constant with respect to (also a student might object that it is unnecessary to introduce a new name to denote the same thing as , which nevertheless is considered bad style as you mention). But I suspect that the thing going on here and elsewhere in the “variable quantity” perspective is that an equality like is interpreted in a way more commonly seen in probability/statistics as in denoting “the set of all states where the random variables and assume the same value.”
This raises some (maybe sidetracking) questions for me:
if the “variable quantity” perspective can be formalized via arrows in a suitable category, then how does one formalize categorically the notion of two quantities (arrows) being “independent” or “constant” with respect to each other?
What does the “set of states of the world” (also mentioned by Toby) correspond to categorically? Some classifying object?
These questions are not directly addressed at Mike or Toby, but if you happen to know some answers I won’t complain. :) Apologies if I can’t respond in the next few days.
Yes, I’m perfectly aware of the standard convention, and I agree that in practice there is no ambiguity in the meaning of a particular definite integral expression, but I described a situation (integration by substitution) in which the lack of notation could be problematic for a student when manipulating several such expressions.
As for the meaning of , I think more generally one of the things we can do with a “variable quantity” is to let it be equal a particular other quantity. If the other quantity is constant, then it “stops varying” and becomes constant, while if the other quantity is also variable then their variation becomes dependent. For instance, when and are variable quantities and we write for what if we might also write as .
Categorically, variable quantities are morphisms from some domain object say — which I think is what Toby meant by the space of “states of the world” — and setting two such variable quantities equal would correspond to restricting the domain to the equalizer of those two morphisms. That’s probably the same as what you mean by ?
I think this “fixing the value of a variable quantity” is the same thing that’s happening in a definite integral. Given a differential form like involving a variable quantity and its differential , we can integrate this form from one particular value of to another. These particular values might be constant quantities or other variable quantities (such as variables), and in the latter case the result is again going to be variable.
I need to think a bit about your first question.
I know what it means for a variable quantity to be constant: it means that it factors through . I’m not sure about “constant with respect to” some other quantity, though. Maybe that is one of those things which only makes sense if the quantity “with respect to” is part of a given basis, so that we can say that the corresponding partial derivative vanishes?
In the context of indefinite integrals, I read ‘’ as ‘antidifferential’, since is the differential of ; that is, is an antidifferential of . (Of course, when , the derivative of with respect to is , so is also an antiderivative of with respect to , like they say in the book.) I tend to avoid the term ‘indefinite integral’; it’s bad enough that (almost) the same notation is used for two different concepts (definite and indefinite integrals), and I'd just as soon not use (almost) the same terminology as well.
I've never liked the idea that the antidifferential of a differential form (or whatever you want to call that) is a set of quantities; I try to say ‘an’ instead of ‘the’ as much as possible. If you just want one antidifferential, then (for example) is OK; but if you want all of them, then you need . (I enforce the book's answers to its problems by saying that it's asking for all of them. And I say that it's only interested in quantities defined on a connected domain, so I don't have to deal with local constants.)
For definite integrals, I introduce the notation first as , where and are equations (preferably with unique solutions). Then is an abbreviation for (as Michael wrote) when the left-hand sides are the same; finally, is an abbreviation for when only one variable's differential appears in the expression for . That's mostly how they look, but I encourage them to use a longer form when doing integration by substitution, for the reasons that Mike gives. (This can violate the requirement that and have unique solutions. It's sufficient that the result of the integral be the same for any choice of solution, or at least for any choice where the solutions of and are connected.)
The Fundamental Theorem of Calculus has two parts, which are inconsistently numbered. By the numbering in our textbook (which is the way that I learnt it):
Since this a theorem and needs fine print (about things being continuous and the like), I state and prove these first in function notation like the book does, but I bring up these forms eventually.
The variable is definitely free in both and , no ‘sort of’ about it. It's bound in , in , and in (which has a more complicated -free definition). I agree with Mike that is bound in , so you can't really differentiate it with respect to , although you could naïvely say that it's as Mike suggested. On the other hand, is fine; by definition, its differential is , so its derivative with respect to is . (You don't even need the FTC for this one.)
there are at least two popular interpretations for “variables” in calculus: one in the sense of “dummy variables” or placeholders for numbers, and one in the sense of “variable quantity” or maybe morphism in a suitable category
These two senses can both be incorporated into categorial logic. In the case of an expression like (which is an abbreviation of and is usually further abbreviated as ), we start with a real-valued quantity in some context (formally a morphism ). The equations and specify certain extensions of , categorially constructed as equalizers (as Mike suggested). Call these extensions and respectively; then if is any real-valued quantity in the context , is a real-valued quantity whose context is the product . (You should be able to draw this using arrow-theoretic diagrams, making use of the subtraction operation .) If it should so happen that and are points (terminal objects), then is simply a real number.
In the case of , if this appears as a problem in a textbook without any further context, the default interpretation is supposed to be that is the largest subset of on which is defined, in this case the entire real line ; then and are indeed points, and so is indeed a real number (as it happens, ). In the context of a word problem where stands for an inherently positive quantity, then it would be more appropriate to take to be instead.1 But in such problems, I think it even more natural to take to be an abstract space, which I think of as the space of possible states of the situation described in the problem. While might never be fully defined, various properties of it may be justified as needed on the basis of the intuition behind the problem. The textbooks, by encouraging us to put everything in the problem in terms of a single variable (such as ), effectively ask us to find that this variable mediates an isomorphism between and some subspace of (such as ); this specifies up to specified isomorphism, so no further intuition is needed. But many problems are easier to solve without expressing everything in terms of one variable, and I encourage my students to take a more flexible approach (especially to things like related rates and optimization problems). This just requires them to be a little more careful about keeping track of the context.
I suspect that the thing going on here and elsewhere in the “variable quantity” perspective is that an equality like is interpreted in a way more commonly seen in probability/statistics as in denoting “the set of all states where the random variables and assume the same value.”
Yes, precisely, and this is an equalizer. In general, I'd say that the probability/statistics people have a good handle on this stuff; they know what a random variable really is, after all, and the rest of us just need to learn that all of our variables are much the same sort of thing.
how does one formalize categorically the notion of two quantities (arrows) being “independent” or “constant” with respect to each other?
Like Mike, I don't think that this is really a sensible notion without specifying what the other independent variables are supposed to be. Rather, what should be formalized is the idea that one quantity is determined by another. Working in the context , a -valued quantity is determined by a -valued quantity if there exists a morphism such that . (This definition appears as one of the fundamental concepts in Lawvere & Schanuel's Conceptual Mathematics.)
What does the “set of states of the world” (also mentioned by Toby) correspond to categorically? Some classifying object?
Sure, although actually it's a coclassifying object. So, while a principal -bundle on (for some topological group and some topological space) is the same as a continuous map from to the classifying space , so an -valued smooth quantity (for some smooth space) in a given context is the same as a smooth map to from a coclassifying space (which I've been calling simply again). So is the coclassifying space for the quantities in the problem.
Of course, I write this as in class, in deference to the textbooks, but I prefer the less overloaded notation . ↩
Lawvere & Schanuel's Conceptual Mathematics
One of my Calculus students came upon this very thread the other day and asked for reading material that would give him some idea of what we were talking about, and I recommended Lawvere & Schanuel. In my opinion, a course using this book should be the first college-level math course that every student takes. Algebra is a prerequisite for it, but not Calculus, so it should come before Calculus. (A bonus is that the practice of requiring Calculus as a prerequisite for unrelated courses such as linear algebra or discrete mathematics, intended to guarantee a level of mathematical maturity, would be served by requiring the course in conceptual mathematics, which is more important to know anyway.)
Of course, first the math teachers have to learn this stuff!
Thanks Toby! How do you define the general form with and equations?
I’m also curious whether you’ve ever tried teaching a course out of Lawvere & Schanuel?
Another question for you, Toby, though not closely related to the subject of this thread. In emphasizing differentials more this semester than before, I’ve found that a lot of my students mix up derivatives and differentials. E.g. they will write things like . Do you have any tricks for alleviating or preventing this confusion?
I just noticed that Sage’s calculus functions use a notion of symbolic variable which seems quite similar to the “variable quantities” under discussion here. The documentation’s description of them as “elements of the symbolic expression ring” suggests that they have a different mathematical formalization in mind, although I haven’t figured out exactly what that means. But their behavior seems quite similar to what we’ve been talking about, e.g. once you declare a symbolic variable , you can then write and differentiate with respect to :
var('x')
y = x^2+1
y.derivative(x)
gives . Although it will also try to guess the variable to differentiate with respect to if you don’t give it one:
y.derivative()
also gives . Sage also seems to assume that all variables are constant with respect to each other:
var('t')
y = x^2 + t^2
y.derivative(x)
also gives . Although you can declare one “variable” to be instead a function of the other:
t = function('t',x)
w = x^2 + t^2
w.derivative(x)
uses the chain rule to give 2*t(x)*D[0](t)(x) + 2*x
. Finally, a symbolic expression like these s can’t be evaluated like a function — or at least trying to do so
y(3)
gives a DeprecationWarning
. But you can make it into a “callable symbolic expression” by designating an order of the variables occurring in it:
z = y.function(x,t)
z(3,8)
I wonder if this would be a good sort of convention to adopt in a calculus class as well, especially one that involves learning to use Sage.
How do you define the general form with and equations?
Now I feel like I ought to think about pulling back to the solution subspace of those equations, but I really only define it for equations with unique solutions on a simply-connected -dimensional domain, that is expressions that can be reduced to , which I define (following the textbook) as a Riemann integral (although sometimes I feel like I ought to do a Henstock integral). This is an approach that already does not generalize to complex variables, of course; in the multivariable class, I talk about oriented curves and all that.
Do you have any tricks for alleviating or preventing this confusion?
Not ones that work!
Mind you, there are plenty of analogous mistakes without differentials. My goal is that they only make mistakes like this that don't make their final answer wrong.
ETA: So for example, if they put in too many differentials, then they might write this:
the middle lines are wrong, but the last is correct (given the first).
But if they put in too few differentials, then they might write this:
now everything is completely wrong (after the first line).
The latter is a fairly standard Calculus-class error, which using differentials helps to avoid; I much prefer the former error.
@Mike 54: nice idea to look at how people have implemented these things in software. Just a quick question for clarification:
I wonder if this would be a good sort of convention to adopt in a calculus class as well, especially one that involves learning to use Sage.
Do you mean the convention of distinguishing between “symbolic variables” and “callable symbolic expressions”?
If yes, it looks to me (at first sight) that these two notions corresponds to our distinction between “variable quantities” (maps with unspecified domain) and “functions” with domains some subset of . In the classical notation it might be the difference between writing and . In the first case would be a variable quantity, in the second case is a function from to itself. Would you agree?
Do you mean the convention of distinguishing between “symbolic variables” and “callable symbolic expressions”?
I guess that’s mostly what I meant. As I said in #54, Sage’s “symbolic variables” do seem to correspond to our “variable quantities”, but I think its “callable symbolic expresions” are not quite the mathematician’s functions, because they still remember the names of their variables. E.g.
f(x) = x^2
(another way to define a callable symbolic expression)
f(3) ===> 9
f(x=3) ===> 9
f(y=3) ===> x^2
So I guess I was wondering whether it would be worth discussing with calculus students the idea of a “function that knows the name of its arguments”.
Mike 43
Instead I’m saying, let’s fix some particular “scale” infinitesimal ; then a first-order infinitesimal is one such that is finite (“limited”), a second-order one is such that is finite, etc.
Well, even more, in ultrafilter model, one looks at sequences with some limiting behaviour, and the integer power law in comparing asymptotic infinitesimals is not the only possibility. You can have exponentially small ones, e.g. such ratios that say is finite. I hope you agree. (Sorry for bringing an issue which is already aged in the thread).
@Zoran: Yes, of course. That’s not even particular to an ultrafilter model, e.g. is still infinitesimal, but “less than first order”. But the integer power law is the relevant one for defining derivatives and higher derivatives.
Surely, Mike, I was not considering the issue critical for your calculus discussion, but for the intuition/image people who know other approaches, primarily SDG, gain about the nonstandard analysis.
@Toby #56: why would you say that there are too many differentials in the first computation? If we add one more (for example multiplying on left) it seems correct. To understand the confusion of students it would be interesting to understand what the student was thinking when doing that computation.
Sure, too many on the right or too few on the left. I'm basically taking the left-hand side (the simpler one and the one first written down) as indicating what the student meant to do and judging correctness or incorrectness based on that. (But when correcting a paper, I might well amend the left-hand side instead, if that's the simpler fix.)
It’s not clear to me that when students make mistakes like this they are thinking anything, in the sense that we would mean the word. Rather, they just don’t seem to have the same understanding we do that mathematical words and symbols have precise meanings and have to be used correctly.
Yeah, I wouldn't want to defend the thesis that the left-hand side indicates what the student intended in any seriously discriminatory way; I mean, I wouldn't want to assume that the student is thinking clearly enough to discriminate between intending , intending , or intending (the latter two being equal, of course, but maybe not trivially so even to a student who is thinking clearly). I just mean that if I have to pick some way to classify the error (as too many differentials or as too few, in this case), then that's the criterion that I'll use.
I’m also curious whether you’ve ever tried teaching a course out of Lawvere & Schanuel?
No. It might not work very well for the students that we get either; it would need a massive illustrated, hand-holding, problem-filled expansion.
Regarding antidifferentials (#44-49), what about introducing a new notation for “equality up to a local constant”? Since an equation like is not an “equation involving a variable ” in the same sense as anyway (you can’t substitute in itto get anything meaningful), it has to be regarded as an “equation between variable quantities”, and then we can change the sense of “equal” as well. Say that if and are variable quantities, then means that and have the same domain, and on every connected subset of that domain there is a constant such that on that subset (or some simpler version of this statement that would be easier to understand). Then we could write
and even
Re #67: I remember stumbling over that issue sometime as an undergrad, or maybe even a grad student. I think I spent days, or at least hours, trying to figure out why some computation wasn’t working, before I realized that I was implicitly assuming a version of “Cauchy’s invariant rule” for second derivatives (though I didn’t know the name of it), and that it might not be true.
From the perspective of #33 above, the problem arises from neglecting the terms that ought to be there in the second differential. I certainly didn’t understand that at the time, but I might have if someone had taught me calculus using differentials to start with!
@Toby, did either of the two answers on your MO question ever pan out? The Hasse-Schmidt one seems promising, as you said, but as stated it seems to be purely algebraic and so only applies to polynomials. Also, if I understood it correctly, there isn’t an operator that could be applied to anything already containing s – instead there is a separate operator which is just asserted to satisfy the Leibniz rule that you would expect if it were actually “-of-”.
Re #68: Then an important basic result (an easy corollary of the Mean Value Theorem) is that (for differentiable quantities) is equivalent to .
Actually, I've considered formally defining to be the operation taking to its -equivalence class. Then all of the hard work goes into defining multiplication of such an equivalence class by an ordinary quantity (or more precisely into defining the equality relation on formal linear combinations of differentials with coefficients from the ring of quantities). Note that naïvely, every quantity has a differential in this sense, but we'll find that things are better behaved when we restrict to differentiable quantities.
Re #69: I dare say that I spent years on this, off and on, struggling to figure out what the heck was going on. It may actually have only been when I was first assigned to teach Calculus that I forced myself to come to some resolution (and shortly thereafter started writing M.O questions about it). I remember struggling with the minus sign in around the same time (although I resolved that one much earlier).
Re #70: No, I never really slogged through the linked articles. I've really just these past few months settled on my own answer. To wit: is the operation that maps a smooth curve to ; maps to , and so on. Of course, itself maps to . Then we just take the subring generated by the above, within the ring of all operations that map a curve to a number (which is commutative). At least for smooth functions, that's all that there is to it.
Is there a derivation that maps that entire subring to itself? It’s clear what it should do on the generators, of course, but it’s not immediately obvious to me that that yields a well-defined operation.
Anyway, it sounds like a reasonable answer, but I find it a bit unsatisfying not to have a more intrinsic characterization of the subring in question, and also to have to assume in advance the notion of smooth.
I'm not sure what you mean by
have to assume in advance the notion of smooth
As far as the M.O question is concerned, we're working on a smooth manifold (in fact a Cartesian space, without loss of generality), so we have this notion. Even if then we try to make it work more generally for diffeological spaces or the like, then all of these still start out with some notion of smooth. (It's the other thread where we're trying to define everything in terms of curves in very general spaces; here we're still trying to understand .)
But if instead you mean that it's unsatisfying to only define this for smooth maps (so not to extend to the case where, say, exists but does not), then I think that it should still work, just with extra effort to keep track of when things might be undefined. (Again, we know ahead of time what's and what's not, so we already know when should be defined.)
It’s clear what it should do on the generators, of course, but it’s not immediately obvious to me that that yields a well-defined operation.
Ah, good point! Actually, I think that I can extend (partially defined) to every operation whatsoever taking a smooth parametrized curve to a real number. Given the curve and a real number , let be the reparametrization of given by . Then given the operation (so is a number), define so that
if this exists. (You can leave as a partially defined operation, or declare that exists only if this limit exists for all .)
This manifestly depends only on the underlying operation, and it does the right thing, recursively, to smooth maps.
Very nice! You can exclude some uninteresting things by restricting to germs of curves, and I think you can even omit the a priori restriction to smooth curves: consider partial real-valued functions from the set of germs (at 0) of all curves, and say a curve is smooth if is defined at for all coordinate functions . (I’m not sure exactly what I was complaining about re: “smooth”, but whatever it was, this makes me happier.) That feels kind of Froelicher: given the relation between partial operations and curves, we consider the fixed point of the resulting Galois connection generated by the coordinate functions. The point in the other thread is that this doesn’t correctly isolate the differentiable functions on the other side: even if is defined, as an operation, on all smooth , then may not be differentiable in the usual sense unless additionally depends only on the tangent vector of a curve and is a linear function thereof. Right?
Interestingly, I think this context also allows operations like : it’s the operation that takes to . And presumably its differential is . I’m not sure whether this is a good thing or not. I’m currently playing around with a different idea for defining higher differentials; if it works I may post up somewhere.
Can you think of a good name for these things that include differentials and also higher ones? We can’t really call them “differential forms” once they have and .
(I guess I’m having trouble separating the threads, sorry – in my mind it’s all one discussion. (-: )
We certainly can call them differential forms even when ; they're just not exterior differential forms. The term ‘form’ is quite general and has a venerable history. (Compare ‘quadratic form’, ‘symmetric bilinear form’, etc.) In M.O, I said ‘cojet differential form’, which is not quite as nice a term as ‘exterior differential form’ (since ‘cojet’ is a noun rather than an adjective like ‘exterior’), but it does get at the right idea: that they act on spaces of jets (the limit of which is the space of germs, as you noted).
I like your ; I have successfully calculated (using Taylor's Theorem with Peano's remainder); actually, the calculation works for generally.
Generalizing still further, I conclude that
for any differentiable function of variables, by pushing everything through the definition, applying Taylor's Theorem to , and observing that the unwanted terms drop out in the limit. What more could one possibly want? (In particular, is a derivation.)
Technicality: You wrote in part
say a curve is smooth if is defined at for all coordinate functions
You mean that is smooth at , or else you mean that must be defined at for all and all real numbers .
in my mind it’s all one discussion
Certainly you borrowed notation from an off-site file linked only in the other thread!
You mean that is smooth at
Yes, thanks.
Certainly you borrowed notation from an off-site file linked only in the other thread!
Really? What notation? You used up in #74 here…
spaces of jets (the limit of which is the space of germs
Technicality again, but that doesn’t seem quite right to me; at least, I can’t see a sense in which it’s true. In particular, a germ is not determined by its -jets for , is it?
We certainly can call them differential forms
Okay, I see the point that it’s historically fine, but my experience is that nowadays mathematicians pretty universally say “differential form” to mean “exterior differential form”. I guess “cojet differential form” would suffice to clarify, which might get abbreviated to “cojet form”.
I think my main worry is using the same symbol for the cojet differential and the exterior differential. For instance, pedagogically speaking, if I teach my calc 1 or calc 2 students to calculate with cojet differentials, aren’t they going to be confused when they get to multivariable and I tell them that now ?
I wonder whether cojet forms and exterior forms could be unified in a larger framework? In some sense, all these cojet forms are still only 1-forms: even though they involve higher derivatives, they only act on curves. But we could consider instead real-valued operators on germs of parametrized surfaces or hypersurfaces as well. For instance, if is an operator on germs of curves, we could define its exterior differential as an operator on germs of surfaces by
or perhaps in the case when might be nonlinear it would be better to say
I haven’t checked that this is at all sensible. But it also starts (unsurprisingly) to make me think of the Weil algebras that define the infinitesimal objects in SDG.
Here’s another thought: can we integrate an arbitrary cojet form? Suppose is a real-valued operator on germs of curves, and let be a curve defined on . Then we have a function defined by
and we could define
if the RHS exists. It seems like it ought to follow that
(where is the commutative cojet differential). But it’s late at night, so I could be spewing nonsense…
You used up in #74 here…
Oops, never mind, that was me, not you!
a germ is not determined by its -jets for , is it?
Ah, no, I must have been implicitly assuming that every function (or at least every smooth function) is analytic, and we wouldn't want to restrict to analytic curves. Still, these operations do depend only on the jets, even when the germs differ. But germs are a simpler concept.
if I teach my calc 1 or calc 2 students to calculate with cojet differentials, aren’t they going to be confused when they get to multivariable and I tell them that now ?
In my Calculus classes, I've been using for the exterior differential of . They've already seen by this point, and this gives the right idea regarding skew-commutativity. (In particular, the signs in the product rule
come out right that way. Not that I ever write down anything like this in that class.) So , but this is very different from .
I do tell them that people usually don't put the wedge in there (and that they sometimes don't put the wedge in the wedge product either), and this is OK because they're restricting attention to exterior differential forms.
But even though I don't actually use higher differentials in my Calculus classes1, they do see differential forms that aren't exterior forms. There are the absolute differential forms, of course, but there's more; consider
It would be criminal not to introduce that in class! But what is ? (or ). It can be thought of as a symmetric bilinear form, but it's also a cojet form. (The two operations, one on a pair of curves and one on a single curve, are related by polarization.)
Now that I understand them better, I might. But expressing, say, the second derivative test for extreme values in terms of differentials instead of derivatives looks so different that it may be too difficult, when it's not in the book. Anyway, the main reason for using differential in class is that people use them in applied fields, so it's not so justifiable to bring in something that you and I invented ourselves. ↩
these operations do depend only on the jets, even when the germs differ
That’s true if by “these operations” you mean the ones constructed from functions by applying the cojet and algebra operations. In #72 you suggested generating a subring, so I guess this is what you’re thinking of. Although wouldn’t be in that subring, nor would ; we’d need to close up under more functions than the ring operations. The whole ring of operations-on-germs, of course, might include operations that really do depend on the whole germ rather than only the jets, although I can’t think of any examples off the top of my head.
In my Calculus classes, I've been using for the exterior differential of
That’s good! I might do the same when I get to exterior derivatives. (Although I still haven’t decided whether I can justify talking about exterior differential forms at all, given that our standard textbook does everything the traditional way in terms of vectors. Is there a good multivariable calculus textbook that uses differential forms?)
the main reason for using differential in class is that people use them in applied fields
Hmm, that’s one good reason, but I think another good reason is that they just make the concepts easier to understand and the computations easier to do. However, it’s not clear to me that higher cojet differentials would be much use in single-variable calc for either of those purposes either. The main advantage I see right now is if I could somehow avoid talking about derivatives at all and use only differentials, but to be really effective that would require a supporting textbook.
One issue with my proposed notion of integration in #81 is that in general, it will depend on the parametrization of the curve, whereas the integral of an ordinary 1-form along a curve does not (though it does depend on its orientation). However, it does include integration with respect to , which is also parametrization-invariant — I guess what matters for that is not linearity but “degree-1 homogeneity”.
Does it also include integration of absolute 1-forms? Can an absolute 1-form be regarded as a cojet form like defined by
(I changed your notation to to avoid confusion with the absolute value bars.)
Re: #80, the wedge product of two cojet 1-forms and ought probably to be the “cojet 2-form” defined on a surface germ by
I still haven’t decided whether I can justify talking about exterior differential forms at all, given that our standard textbook does everything the traditional way in terms of vectors. Is there a good multivariable calculus textbook that uses differential forms?
I don't know of one; even Dray & Minogue don't go that far.
My justification is that they're already integrating differential forms; the classical expression is already the integral of a differential form; you just need to take it literally. All of the formulas are in my handout (where Page 6 is strictly time-permitting … which so far it hasn't been).
Suppose I start with a function and take its cojet differential over and over again.
It appears that each term in is of the form
for some and some (unordered) partition . Are the coefficients appearing here some well-known combinatorial numbers associated to partitions?
Over in the other thread, David R posted a link to an MO answer which reminded me to look back at Arnold’s book on classical mechanics, which suggests the following definition of the exterior differential of a cojet (or perhaps “cogerm” would be more appropriate) 1-form:
where is a loop inside the parametrized surface which shrinks to nothing around . (It might be a rectangle or parallellogram, but from the general perspective that restriction seems unaesthetic.)
Comparing this to the definition of the differential from cogerm 1-forms to cogerm 1-forms, and its relationship to the exterior differential acting from 0-forms to 1-forms, suggests the following operation from cogerm 2-forms to cogerm 2-forms:
where is a loop as before, with domain , and is a shifted version of the surface. Is this a 2-form version of the cogerm differential?
Just throwing stuff out there at the moment, hoping sometime soon I’ll have time to think about it all carefully.
Probably “” should be instead the area enclosed by . But having thought about it a little more, I realized those limits don’t really make sense unless the integrals are invariant under reparametrization. So maybe the exterior differential doesn’t really make sense except for degree-1 1-forms? And is there any sort of commutative differential on 2-forms? Would we hope or expect it to behave in any particular way? It feels weird to me that we have the world of cogerm 1-forms with the commutative , and the world of exterior forms with the exterior , which agree in the world of linear degree-1 1-forms and the differential of functions, but are thereafter completely unrelated.
Can an absolute 1-form be regarded as a cojet form like defined by
I would certainly accept this definition of in line with the previous discussion of (where is a cojet form, or more generally a finite list of such, and is a differentiable function); there's no reason that has to be differentiable (we just can't conclude that is differentiable).
So I guess that your question is: if is an exterior -form, then is this the absolute -form called on the absolute differential form page? And the answer is Yes; at least, it certainly does the right thing to a curve.
But not every absolute -form arises in this way! Besides multiplying by an arbitrary -form (so that an absolute -form need not be positive semidefinite), even some positive definite forms, such as , don't arise in this way.
Nevertheless, any absolute -form does have an action on curves (via their tangent vectors, if you follow the definition at absolute differential form), and this is homogeneous of degree , so your integration formula does integrate them.
It feels weird to me that we have the world of cogerm 1-forms with the commutative , and the world of exterior forms with the exterior , which agree in the world of linear degree-1 1-forms and the differential of functions, but are thereafter completely unrelated.
There is some more overlap if you look at symmetric bilinear forms (rather than only the antisymmetric ones that are exterior -forms). Some cojet (or cogerm) forms are linear, and these agree with the exterior -forms; but some cojet forms are quadratic, and these agree with the symmetric bilinear forms. Of course, these are viewed as functions of different things, but they are equivalent by the polarization identities. An arbitrary bilinear forms is then given by a quadratic cojet form together with an exterior -form.
This doesn't go so easily into higher rank.
I thought it was about time to record some of this discussion, so I created cogerm differential form.
Looks good! I discussed it in a thread dedicated to it. (Mike already noticed this, but I record it for the sake of future generations.)
Re: #87, the sum of the coefficients of terms in involving is the Stirling number of the second kind : the number of ways to partition an -element set into nonempty subsets. The coefficients themselves are simply the further classification of these partitions according to the multiset of cardinalities of the nonempty subsets (which feels like it ought to have something to do with Young tableaux). This is more obvious if we use the coflare differentials where : then none of the terms can be combined, and each term like evidently represents a particular partition of an -element set into nonempty subsets.
In coflare differentials, I don't think that makes sense at all; in any case, it doesn't show up in . That's just as well, since the Stirling number doesn't count and as distinct partitions of into nonempty subset.
Yes, that’s true; I think I meant to say something like .
Or simply that . Either will do, since the first nontrivial coefficient comes from combining , , and , where already for each pair there are two differences between them.
On the subject of partial derivatives, John Denker makes the interesting point that
at http://www.av8n.com/physics/partial-derivative.htm#sec-wedge-ratio. This is easy enough to verify by calculation, but also check out the pictorial explanation.
Trying to make the previous comment work with second derivatives:
Suppose that is a function of . Then
so
Thus,
which expands to
On the other hand,
so
so
Now suppose that is a function of and . Then
so
so
Thus,
which unfortunately can't be expanded without abandoning the notation.
On the other hand,
so
so
Re #87:
The coefficients appearing here are those that appear in Bell polynomials, and they are well known (although not by me, until yesterday) both to come from counting partitions and to give a formula for the higher derivatives of a composite function, Faà di Bruno's formula. This formula gives the higher cojet differentials of , where is a real-valued function of a real variable, differentiable at least times, and is a real-valued quantity (technically a real-valued function on some manifold), also differentiable at least times:
where the sum is taken over the set of all partitions of , each partition being thought of as a subset of the powerset of (so that both and any have a cardinality given by ).
A partly multivariable version of the formula may be adapted to coflare forms. First some notation: if is a finite multisubset of , then write for (which is unambiguously defined if is at least times differentiable). Also, if (a set, not any multiset), then let be (a multiset). With this notation,
a partial decategorification of the cojet version.
A fully multivariable version of the formula would also allow to be a function of variables, with as the order- case, but I haven't tried to think that through yet.
ETA: You can take and to be tuples rather than multisets, if you prefer. But the order doesn't matter, just as with partial derivatives.