Not signed in (Sign In)

Not signed in

Want to take part in these discussions? Sign in if you have an account, or apply for one below

  • Sign in using OpenID

Site Tag Cloud

2-category 2-category-theory abelian-categories adjoint algebra algebraic algebraic-geometry algebraic-topology analysis analytic-geometry arithmetic arithmetic-geometry book bundles calculus categorical categories category category-theory chern-weil-theory cohesion cohesive-homotopy-type-theory cohomology colimits combinatorics complex complex-geometry computable-mathematics computer-science constructive cosmology definitions deformation-theory descent diagrams differential differential-cohomology differential-equations differential-geometry digraphs duality elliptic-cohomology enriched fibration foundation foundations functional-analysis functor gauge-theory gebra geometric-quantization geometry graph graphs gravity grothendieck group group-theory harmonic-analysis higher higher-algebra higher-category-theory higher-differential-geometry higher-geometry higher-lie-theory higher-topos-theory homological homological-algebra homotopy homotopy-theory homotopy-type-theory index-theory integration integration-theory k-theory lie-theory limits linear linear-algebra locale localization logic mathematics measure-theory modal modal-logic model model-category-theory monad monads monoidal monoidal-category-theory morphism motives motivic-cohomology nlab nonassociative noncommutative noncommutative-geometry number-theory of operads operator operator-algebra order-theory pages pasting philosophy physics pro-object probability probability-theory quantization quantum quantum-field quantum-field-theory quantum-mechanics quantum-physics quantum-theory question representation representation-theory riemannian-geometry scheme schemes set set-theory sheaf simplicial space spin-geometry stable-homotopy-theory stack string string-theory superalgebra supergeometry svg symplectic-geometry synthetic-differential-geometry terminology theory topology topos topos-theory tqft type type-theory universal variational-calculus

Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.

Welcome to nForum
If you want to take part in these discussions either sign in now (if you have an account), apply for one now (if you don't).
    • CommentRowNumber1.
    • CommentAuthorMike Shulman
    • CommentTimeNov 6th 2013

    My calculus book uses many different notations for the derivative of y=f(x)y = f(x) with respect to xx, such as

    dydxyf(x)Df(x)dfdx \frac{dy}{dx} \quad y'\quad f'(x) \quad D f(x) \quad \frac{df}{dx}

    Recently I’ve found that I kind of object to a couple of these. For instance, consider dfdx\frac{df}{dx}. One of the things I try to teach my students is that when we define a function ff by writing f(x)=x 2f(x) = x^2, say, the variable xx is a dummy variable; if we wrote f(t)=t 2f(t) = t^2 we would be defining the same function, namely the one which squares its input. But if “ff” denotes a function of this (usual mathematical) sort, then how can we write dfdx\frac{df}{dx} to mean its derivative, since ff doesn’t know that we called its input variable xx?

    Notations like f(x)f'(x) and Df(x)D f (x) don’t have this problem, because ff' and DfD f denote the derivative function of ff, which assigns to each input value the derivative of ff at that value, and so f(x)f'(x) and Df(x)D f (x) just mean evaluation of this function at that value. I suppose we could interpret dfdx\frac{df}{dx} similarly if we regarded “dfd\frac{df}{d}” as the derivative function of ff, which we evaluate at something by placing it after the dd in the denominator, but this seems strained, and would suggest even odder notations such as writing dfd3\frac{df}{d3} for f(3)f'(3).

    It was pointed out to me that this kind of notation is even commoner in multivariable calculus, where we write things like fx\frac{\partial f}{\partial x} and fy\frac{\partial f}{\partial y}, and in this case there aren’t good alternatives available, since we have to indicate somehow whether it is the first or second input variable of ff with respect to which we take the derivative.

    From a differential-geometric viewpoint, one answer is to say that xx denotes a standard coordinate function on the 1-dimensional manifold that is the domain of ff, and so we are actually taking the derivative with respect to a vector field associated to that function. We can even regard dfdx\frac{df}{dx} as a literal quotient of differential 1-forms, since 1-forms on a 1-manifold are a 1-dimensional vector space at each point, so the quotient of two of them is a real number. But while logically consistent, this seems to undercut the force of the lesson of dummy variables, since we are endowing xx with a special status not shared by tt.

    Using xx to denote the coordinate function has the other interesting consequence that it makes it okay to say “the function x 2+1x^2+1” (since we can multiply and add functions together), instead of insisting on saying “the function ff defined by f(x)=x 2+1f(x) = x^2+1”. Again this feels like it undercuts the lesson of what a function is — and yet I find that it’s hard to teach a calculus class without eventually slipping into saying “the function x 2+1x^2+1”. With xx as a function we can also write “f=x 2+1f = x^2+1”, which again is something that I’m used to indoctrinating my students against.

    I also have a problem with the notation yy', for a more pragmatic reason. Suppose we want to take the derivative of y=(3x+1) 4y = (3x+1)^4 using the chain rule. A nice way to do it is to make a substitution u=3x+1u = 3x+1, so that y=u 4y = u^4, and then use differentials:

    du=3dxdu = 3 dx dy=4u 3du=4(3x+1) 3(2dx)dy = 4u^3 du = 4(3x+1)^3(2 dx) dydx=8(3x+1) 3\frac{dy}{dx}= 8(3x+1)^3

    The problem here is that the notation yy' doesn’t indicate what variable we differentiate with respect to, and in this calculation we have two derivatives of yy, namely dydx=8(3x+1) 3\frac{dy}{dx} = 8(3x+1)^3 and dydu=4u 3\frac{dy}{du}=4u^3, which are not equal even after substituting the value of u=3x+1u = 3x+1. Here the solution seems to be straightforward: just don’t write yy'. But if we can write f=x 2+1f = x^2+1 just like y=x 2+1y = x^2+1, and if we allow the notation f(x)=2xf'(x) = 2x and hence also f=2xf' = 2x, then we should just as well have y=2xy' = 2x.

    Does anyone have a good solution? I feel like at least part of the problem comes from confusing \mathbb{R} as the real numbers with \mathbb{R} as a 1-dimensional manifold, but I haven’t exactly managed to pin down yet how to solve it from that point of view.

    • CommentRowNumber2.
    • CommentAuthorhilbertthm90
    • CommentTimeNov 6th 2013

    This doesn’t answer the broader question, but there is a “better” alternative for the multivariable calculus notation. It is pretty typical to use f y(x,y)f_y (x,y) to denote the partial derivative of ff with respect to yy.

    • CommentRowNumber3.
    • CommentAuthorZhen Lin
    • CommentTimeNov 6th 2013

    For what it’s worth, Mathematica writes multivariable derivatives as f (i,j,,k)f^{(i,j,\ldots,k)}, which means the ii-th partial derivative with respect to the first variable, the jj-th partial derivative with respect to the second variable, etc. Unfortunately, this notation presupposes the commutativity of partial derivatives.

    On ff' and dydx\frac{d y}{d x}: I think you are right there. In order to use notations like ff' consistently, it is appears to be necessary to abandon notions like “change of variable” and regard x(3x+1) 4x \mapsto (3 x + 1)^4 and uu 4u \mapsto u^4 as distinct functions. So it is incompatible with the setup where calculus expressions are regarded as functions on some manifold – because there are no preferred coordinates. From a syntactic point of view it does seem rather disturbing that the xx in the denominator of dydx\frac{d y}{d x} is bound but free in dydx\frac{d y}{d x} itself. It’s almost as if ddx\frac{d}{d x} is some kind of variable binding operator like λ\lambda or \prod or \sum… except for the fact that it doesn’t bind the variable at all! Compare:

    x:y(3x+1) 4:x : \mathbb{R} \vdash y \equiv (3 x + 1)^4 : \mathbb{R} x:dydx12(3x+1) 3:x : \mathbb{R} \vdash \frac{d y}{d x} \equiv 12 (3 x + 1)^3 : \mathbb{R} λx.(3x+1) 4:\vdash \lambda x . (3 x + 1)^4 : \mathbb{R} \to \mathbb{R}

    Accordingly, we should also require the use of substitutions instead of evaluations when working with the dydx\frac{d y}{d x} notation: so dydx| x=0\frac{d y}{d x} |_{x = 0} instead of dydx(0)\frac{d y}{d x} (0).

    • CommentRowNumber4.
    • CommentAuthorDavid_Corfield
    • CommentTimeNov 6th 2013

    What would elementary calculus look like in cohesive homotopy type theory?

    • CommentRowNumber5.
    • CommentAuthorUrs
    • CommentTimeNov 6th 2013
    • (edited Nov 6th 2013)

    David,

    with cohesion one can essentially characterize d:Ω cl 1\mathbf{d} \colon \mathbb{R} \longrightarrow \mathbf{\Omega}^1_{cl} such that postcomposition of a function f:Xf \colon X \longrightarrow \mathbb{R} with this map yields the derivative df:Ω cl 1(X)\mathbf{d} f \colon \in \mathbf{\Omega}^1_{cl}(X) of ff. This is disucssed in some detail at geometry of physics in the section 4. Differentiation

    A variant is a certain homotopy pullback of this construction, which yields variational calculus, as discussed there in the section In terms of smooth spaces.

    In differential cohesion one can get hold of the infinitesimal interval DD and then proceed as in synthetic differential geometry. For instance using this one can describe differential equations as discussed there in the section In terms of synthetic differential equations.

    Moreover, differential cohesion encodes D-geometry and hence in principle allows to talk about differential equations in that way.

    • CommentRowNumber6.
    • CommentAuthorUrs
    • CommentTimeNov 6th 2013

    I have added my previous reply as a paragraph to differential calculus – In cohesive homotopy theory

    • CommentRowNumber7.
    • CommentAuthorDavid_Corfield
    • CommentTimeNov 6th 2013
    • (edited Nov 6th 2013)

    Thanks. I meant my question to see if perhaps in some idealised setting where Mike has complete control over his students’ maths education, so has taught them HoTT from an early age, now when he comes to teach calculus, and he adds the cohesive axioms, is he ever confronted with the issues he raised in #1?

    He has f:RRf: R \to R, with f(x)=(3x+1) 4f(x) = (3 x + 1)^4. So f=g(h)f = g(h) for obvious gg and hh. So then using your d\mathbf{d},

    df=dg(h), \mathbf{d} f = \mathbf{d} g(h),

    at which point a chain rule kicks in. (I’m waiting for geometry of physics to load to read Prop. 26, but it takes about 10 minutes to typeset on this machine!)

    So all of Mike’s worries are over!

    • CommentRowNumber8.
    • CommentAuthorUrs
    • CommentTimeNov 6th 2013

    I’m waiting for geometry of physics to load to read Prop. 26, but it takes about 10 minutes to typeset on this machine!

    Oh, that’s a pain. For me it’s slow, but not quite this slow.

    This is with math rendered by Mathjax, I suppose? I suppose in some other browser maybe and/or with the rewquite fonts installed, it should take no extra time?

    Once there was this vision that we have decent math on the web. But somehow it still seems to be a long, long way to go, for some reason.

    • CommentRowNumber9.
    • CommentAuthorUrs
    • CommentTimeNov 6th 2013

    So all of Mike’s worries are over!

    While I suppose you are joking (right? :-) this reminds me that maybe we shouldn’t hijack Mike’s thread too much.

    My reply to the topic here would be: since we are humans and not proof assistants, whenever we actually do some work we’ll adopt convenient “abuse of notation”. One should alert students as to what’s really going on, but I wouldn’t worry too much about enforcing a formally consistent notation.

    • CommentRowNumber10.
    • CommentAuthorDavid_Corfield
    • CommentTimeNov 6th 2013

    It wasn’t completely a joke. There was something like the thought that if cohesive HoTT is the God-given way to do things, then it might suggest the least bad forms of “abuse of notation”.

    • CommentRowNumber11.
    • CommentAuthorMike Shulman
    • CommentTimeNov 6th 2013

    @hilbertthm60: I don’t understand how f y(x,y)f_y (x,y) is better. You still have the variable yy occurring in the notation f yf_y for the partial derivative function. Am I misunderstanding?

    @Urs: It’s true that we always abuse notation, but I find that the correct and incorrect ways to abuse notation are one of the hardest things for beginning math students to understand. I haven’t spent a lot of time thinking about it, but I’ve generally assumed that it’s not really possible to understand how to abuse notation until you understand “the way mathematics works” at a sufficiently deep level, so that when teaching students who don’t yet understand math, it’s better to try to avoid abusing notation as much as possible.

    • CommentRowNumber12.
    • CommentAuthorRodMcGuire
    • CommentTimeNov 6th 2013

    From the perspective of Logic Programming, and a particular variant, “variables” are a notational solution that conflates 2 things: labeling and binding.

    Traditionally, formula are written as linear strings of symbols that are parsed into trees. One use of variables is as external labels to note that parts of a “tree” share the same substructure and it is really not a tree.

    For example in the formula x+xx + x the two xxs can be seen as the same substructure and substituting “1” for “x” involves changing just one substructure, not 2. The result of the substitution, “1 + 1”, has dropped any labeling that indicates that the two “1”s are really the same and came from the same place. One can explicitly use external labels for this situation using a name followed by “?”. Then the two formulas would be notated as x?+x?x?\top + x?\top and x?1+x?1x?1 + x?1 where \top indicates that the first x?x? structure is “unbound”.

    Binding traditionally has two states - unbound or bound to something totally specific. The alternative perspective is that “binding” is a continuum that takes place in a lattice of structures where \top means “completely unknown” and \bot means “totally contradictory”.

    In an expression like x+xx+x, xx is rarely completely unknown. Usually xx is known to be some specific type of number, but which exact number is unknown. One could even give xx an intermediate binding such as 121\vee 2. Evaluating x?(12)+x?x?(1\vee 2) + x? gives the result 242 \vee 4 while if the structure is not shared, evaluating x?(12)+y?(12)?x?(1\vee 2) + y?(1\vee 2)? gives 2342 \vee 3 \vee 4.

    There are two notions of substitution. “Binding substitution” or “unification” is used to make structures more specific. For example (x?+x?)(1+)(x?\top + x?\top) \wedge (1 + \top) is unified to x?1+x?1x?1 + x?1 while (x?+x?)(1+2)(x?\top + x?\top) \wedge (1 + 2) becomes x?+x?x?\bot + x?\bot because 121\wedge 2 becomes the contradictory structure.

    Actual true substitution is rare in maths. Rarely does one substitute 22 for 11 in an expression like 1+11 + 1 to give 2+22 + 2 though there can be systems where some structure holding 11 is regarded as a default value that can be overridden. The result of substituting 22 for 11 in 1+11 + 1 depends on whether the two 11s are the same structure or not - 11 is not substituted for, instead a maybe shared structure bound to 11 gets rebound.

    • CommentRowNumber13.
    • CommentAuthorMike Shulman
    • CommentTimeNov 7th 2013

    Rod, does any of that suggest an answer to my question?

    • CommentRowNumber14.
    • CommentAuthorRodMcGuire
    • CommentTimeNov 7th 2013

    Rod, does any of that suggest an answer to my question?

    It does to your question “What is a variable” :)

    But to your more elaborate question it seemed like you were complaining about “variables” that played the role of something like labels, flags, or indices into structures without being “true variables” that can get bound. I just wanted to mention an alternative way of thinking about variables, binding, and substitution which might fit your problem if I understood the math enough, however I got a little carried away in my typing.

    And for those who ares becoming Hegelian, I wanted to sneak in the opposition between “totally contradictory” and “completely unknown” :), though it doesn’t seem to involve adjoints.

    I’m sorry.

    • CommentRowNumber15.
    • CommentAuthorMike Shulman
    • CommentTimeNov 7th 2013

    My question wasn’t really “what is a variable”; that was just the best short-ish title for this kind of rambly question that I could think of. I don’t think I was complaining about “variables that can’t be bound”, but maybe I was; can you explain further? What does your description have to say about notations for derivatives?

    • CommentRowNumber16.
    • CommentAuthorMichael_Bachtold
    • CommentTimeNov 9th 2013
    • (edited Nov 9th 2013)

    Hello Mike,

    you seem to suggest that these issues are resolved if we interpret xx as a function instead of a variable, a point of view I sympathize with. Unfortunately I don’t quite understand the arguments against this step. You seem to mention two:

    But while logically consistent, this seems to undercut the force of the lesson of dummy variables, since we are endowing xx with a special status not shared by tt.

    and

    With xx as a function we can also write “f=x 2+1f = x^2+1”, which again is something that I’m used to indoctrinating my students against.

    Correct me if I’m wrong, both of these arguments disappear if we interpret xx as a function: in the first case since the function tt might (or might not) be the same as the function xx. (This seems in agreement with applications of calculus where different variables like time tt and position xx are not interchangeable.) The second argument is resolved since for a function y=f(x)y=f(x) all of the symbols yy, f(x)f(x), ff and y(x)y(x) now denote the same thing. The notations f(x)f(x) would be a different notation for composition fxf\circ x (with xx usually denoting the identity function).

    So maybe there are other arguments against this convention?

    Another question: if yy and xx only denote variables, should the symbol dydx\frac{dy}{dx} denote a variable or a function?

    Edit (after reading your question again): you also suggest, that the “evil” notation is the primed one yy', which I agree with. Physicist have a convention of writing the dot but only to denote derivatives with respect to time, so that doesn’t have the same ambiguity. Also if DfDf denote the differential dfdf as Todd suggests, than that’s Ok since it has a different meaning from dfdx\frac{df}{dx}

    Cheers Michael

    • CommentRowNumber17.
    • CommentAuthorTodd_Trimble
    • CommentTimeNov 9th 2013

    Sorry to enter this discussion so late. But Mike: it sort of looks like you’ve answered your own question back in #1! Is there a problem with just using DfD f notation?

    It’s probably hard to tease apart for calculus students what is really going on, but if we start with a function f:f: \mathbb{R} \to \mathbb{R} and apply the tangent bundle functor () D(-)^D from SDG, keeping in mind the Kock-Lawvere axiom asserting an isomorphism (pos,vel): D×(pos, vel): \mathbb{R}^D \to \mathbb{R} \times \mathbb{R}, where the first map is evaluation at the point 1D1 \to D. Then notationally it seems quite alright to describe f Df^D in terms of a mapping that is customarily denoted as

    (p,v)(f(p),Df| pv)(p, v) \mapsto (f(p), D f|_p \cdot v)

    and this carries over just fine to the multivariate setting. Maybe it’s even a really good idea to implant the thought that the derivative Df| pD f|_p (or Df(p)D f(p) if you’d prefer) is not really just to be thought of as just a “number” but as a linear function (which can be characterized by a number in the 1-dimensional setting, by setting v=1v = 1 and taking Df(p)1D f(p) \cdot 1)?

    If this pedagogical point is to be driven home seriously, I suppose you could dismiss notations like dfdx\frac{d f}{d x} as archaisms, reflecting an earlier age when notions of functions, variables, etc. hadn’t been fully worked out.

  1. Is what Todd denotes with DfDf the same as dfdf the differential of the function ff?

    • CommentRowNumber19.
    • CommentAuthorTodd_Trimble
    • CommentTimeNov 9th 2013

    Michael_B: as far as I’ve always seen it used, if f:MNf: M \to N is a smooth mapping and pMp \in M, then Df(p):T p(M)T f(p)(N)D f(p): T_p(M) \to T_{f(p)}(N) is the derivative mapping at pp, also sometimes called the Jacobian at pp. If ff is real-valued, then all the tangent spaces T f(p)()T_{f(p)}(\mathbb{R}) are canonically identified with \mathbb{R}, and the resultant linear functional Df(p):T p(M)D f(p): T_p(M) \to \mathbb{R} is an element in the cotangent space of MM at pp, also denoted df(p)d f(p) as you say.

    • CommentRowNumber20.
    • CommentAuthorMike Shulman
    • CommentTimeNov 10th 2013

    @Michael, you seem to be saying that I should just give up on indoctrinating my students not to write f=x 2+1f = x^2+1, and on teaching them that f(x)=x 2f(x) =x^2 and f(t)=t 2f(t)=t^2 define the same function? I think there is an important point to distinguish between the values of a function and the function as an object in its own right. Only once you really understand that are you justified in failing to notate it.

  2. Hello Mike. I guess that’s what I’m saying :) Admittedly, I’m not sure it’s a valid solution to the problem, neither mathematically nor pedagogically. But it seems to me that it corresponds more to the common practice in calculus.

    Already a statement like y=f(x)y=f(x) is mostly read as “y is a function of x”, and less often as “y is the value of the function f at input x”. (At least among engineers and physicists). Or I have seen statements like “suppose V(t)V(t) is the volume of water at time t” together with a plot where the vertical axis is denoted with VV. Without introducing an additional variable to denote the value of the function VV.

    Or consider when calculus textbooks discuss cartesian coordinates (x,y)(x,y) and polar coordinates (r,θ)(r,\theta) in the plane. The easiest way for me to think about this is by interpreting x,y,r,θx,y,r,\theta as functions on the plane. Wouldn’t it be cumbersome to introduce additional names to denote the functions of which x,y,r,θx,y,r,\theta are the values?

    Unfortunately I don’t yet see clearly what trouble such a point of view causes. (let’s restrict to mathematical problems since the pedagogical ones are hard to predict)

    Probably we do have to distinguish between the function and its value sometimes, but can’t we still do this say by obtaining the value of xx at 33 by precomposing with the constant function 33?

    • CommentRowNumber22.
    • CommentAuthorMike Shulman
    • CommentTimeNov 10th 2013

    I agree entirely that it doesn’t cause any problems mathematically. My point is entirely a pedagogical one.

    • CommentRowNumber23.
    • CommentAuthorTobyBartels
    • CommentTimeNov 19th 2013

    Like you, Mike, I try to teach my Calculus (and Algebra) students the difference between a function and its value, even though the textbooks do what they can to undermine my efforts. Of course, it is important to see the abuses of notation that they're liable to meet in applied fields, but (as you say) they have to understand the correct way first. So I tell them that the textbook abuses notation, but I try to never abuse it myself. In particular, I tell them that yy' is ambiguous, so I don't use it; they are allowed to (since the book does) but I recommend against it. I use f(x)f'(x) and dy/dx\mathrm{d}y/\mathrm{d}x, as you suggest, instead. (Besides that, Df(x)\mathrm{D}f(x) is all right, too; but since our book doesn't use it, I only mention it in passing.)

    As for df/dx\mathrm{d}f/\mathrm{d}x, this is a bit subtle; df(x)/dx\mathrm{d}f(x)/\mathrm{d}x is just fine; in fact, if y=f(x)y = f(x), then y=df(x)/dx=f(x)y = \mathrm{d}f(x)/\mathrm{d}x = f'(x). But that doesn't make df/dx\mathrm{d}f/\mathrm{d}x a legitimate synonym of ff'! It's a matter of logic; like Rod #12 said, the two xxs stand for the same thing, so you can't substitute for one without substituting for the other. Even this is subtle; if you substitute, say, 33 for xx in f(x)=df(x)/dxf'(x) = \mathrm{d}f(x)/\mathrm{d}x, then you get f(3)=df(3)/d3f'(3) = \mathrm{d}f(3)/\mathrm{d}3, which doesn't quite work. On the other hand, if you substitute tt for xx instead (a change of variable), then f(t)=df(t)/dtf'(t) = \mathrm{d}f(t)/\mathrm{d}t is fine. This works better with differentials; df(x)=f(x)dx\mathrm{d}f(x) = f'(x) \,\mathrm{d}x becomes df(3)=f(3)d(3)\mathrm{d}f(3) = f'(3) \,\mathrm{d}(3), which is fairly trivial but at least correct. The problem is that, in writing df(x)/dx\mathrm{d}f(x)/\mathrm{d}x, you're tacitly assuming that dx\mathrm{d}x is nonzero, that is that xx is a variable quantity1. (This is the origin of the term ‘variable’, I believe, even though we now use that also for symbols that stand for constants.) So you should only substitute something variable for it. (But in df(x)=f(x)dx\mathrm{d}f(x) = f'(x) \,\mathrm{d}x, no such assumption is being made.) This is really no more mysterious than why you can't substitute 11 for xx in (x 21)/(x1)(x^2 - 1)/(x - 1).

    What does it mean for xx to stand for a variable quantity? Doesn't it stand for a real number? Yes … but it stands for a variable real number. This brings us to the question in the title: what is a variable? Lawvere said that a variable can be any morphism in any category; then a variable real number (aka a real-valued variable or simply a real variable) is a morphism whose target is the space2 of real numbers. For some reason, the only field in which this penetrates the undergraduate curriculum is statistics; there, they know that a random variable is a measurable function on a measurable space (typically valued in the measurable space of real numbers with Borel measure), a morphism in the category of measurable spaces and measurable functions (or something like it). Even in an elementary treatment where every random variable is defined a finite space and the word ‘measurable’ is never uttered, they still give a definition of ‘random variable’. In Calculus, we usually study smooth variables (or smoothly varying quantities), which are morphisms in the category of smooth manifolds and smooth functions (or something like it). This is actually the first lesson in my Applied Calculus course: what a smooth variable is (very roughly, of course). (In the regular Calculus course, we don't usually assume that everything is smooth, so I have to bring this in later.)

    So xx, yy, tt, uu, etc are all variables (usually smooth ones). What then is ff? It's a function, but I mean ‘function’ in the sense used in elementary algebra, that is a partial function (a partial morphism in the category of sets) from \mathbb{R} to \mathbb{R}, usually a smooth one (so infinitely differentiable wherever defined). Quantities like f(x)f(x) are defined at the formal level by composition; we don't write this as fxf \circ x because we conceptually distinguish variables (with arbitrary unspecified domain) from functions (with domain a specified subset of \mathbb{R}). So (pace Michael #21) xx, yy, rr, and θ\theta may indeed be functions on the plane, but we're not treating them in the same way as ff, so they use different notation. (And then they might not be functions on the plane; if you're really studying the motion of a particle in the plane, they might be better thought of as functions of time.)

    I won't even get into the problems with notation for second derivatives and multivariable calculus. In general, all of this stuff works better with differentials than with derivatives (right down to the terms ‘differential’ and ‘derivative’), but I've said enough for now.


    1. Well, a nonstationary variable quantity. 

    2. Here, a space is simply an object of whatever category is relevant. The category of sets and functions may or may not be the most appropriate category. 

  3. @Toby: Interesting comments.

    Just a few questions

    1) How do you bring the point across to students that sometimes xx stands for the value of a function (or constant, as in your first paragraph) and sometimes xx represents a variable quantity (morphism in a category) as in your paragraphs 3 and 4? Especially if we use the same symbol.

    2) Why is it important to make a conceptual distinction between functions with unspecified domains (like x,t,r) and functions with domains subset of n\mathbb{R}^n like ff, and use different notation for their compositions?

    Another remark: form the perspective of interpreting x,y,tx,y,t etc. as morphism in a suitable category, a common abuse of notation I see, is that a lot of times the pullback of a function along a map gets denoted with the same symbol. So for example when describing motion of a particle in the plane by saying x,yx,y are functions of time, I’d interpret this as saying x,yx,y are actually the pullbacks of the standard coordinates x,yx,y on the plane to the 1-dimensional manifold representing time, along the map given by the motion.

  4. I think I have a partial answer to 2): functions with values in \mathbb{R} but unspecified domains cannot be composed, but we may always compose them with functions from matbbR\matbb{R} to itself. Is that the reason?

    • CommentRowNumber26.
    • CommentAuthorMike Shulman
    • CommentTimeNov 19th 2013

    Thanks Toby! I think you’ve resolved my confusion by saying that the manifold in question should be an arbitrary one (the domain of generalized elements), not even necessarily 1-dimensional. I hadn’t thought about df(x)/dxd\, f(x) /dx, but you’re right that that makes perfect sense.

    My explanation of substitution would have been a bit different. I would say that when xx is a variable, then dxdx is a new variable, albeit one which happens to be related to xx in a certain way. (In other words, we now consider generalized elements of the tangent bundle TT\mathbb{R} rather than the manifold \mathbb{R}.) So substituting 3 for xx doesn’t make dxdx into d(3)d(3). Rather, it just means that instead of dxdx being a small variation about xx, it is a small variation about 3.

    I would really like to hear what you have to say about second derivatives and multivariable calculus, if you have time sometime to share it. I have recently discovered another problem with the ff' notation: prime is a very small symbol and hard to see from the back of the room! (-:

    • CommentRowNumber27.
    • CommentAuthorZhen Lin
    • CommentTimeNov 19th 2013

    The second derivative is not really a notion that goes well with differential forms. After all, d 2=0\mathrm{d}^2 = 0! In the one-variable case one can cheat and observe that the tangent bundle is generated by a global section dx\mathrm{d} x, so there is a unique function dfdx\frac{\mathrm{d} f}{\mathrm{d} x} such that df=dfdxdx\mathrm{d} f = \frac{\mathrm{d} f}{\mathrm{d} x} \mathrm{d} x, and since dfdx\frac{\mathrm{d} f}{\mathrm{d} x} is a function, we can talk about its derivative again. (Note, we can do this equally well on the real line or the circle!) This cheat can be made to work in the many-variable case provided the tangent bundle is trivial, but it’s far more obvious that one is doing some very coordinate-dependent things then. (One thing that is always very confusing is that the partial differential operator x i\frac{\partial}{\partial x^i} depends on the choice of the other coordinates as well, despite the notation! This is in contrast to the differential 1-form dx i\mathrm{d} x^i, which depends only on the coordinate function x ix^i.)

    I think one is supposed to think about jet bundles if one wants to do higher-order derivatives. But that seems a step too far for first-year calculus.

    • CommentRowNumber28.
    • CommentAuthorMike Shulman
    • CommentTimeNov 19th 2013

    Certainly jet bundles are themselves a step too far for first-year calculus, but then so are tangent bundles. The trick would be to find a way to have them in the background causing things to make sense, but not needing to be mentioned explicitly.

    The most direct thing to see is the iterated tangent bundle: if dxdx is a variable element of TT\mathbb{R}, then d(dx)d(dx) is a variable element of T(T)T(T\mathbb{R}). But unfortunately that is a bit bigger than the jet bundle…

    • CommentRowNumber29.
    • CommentAuthorMike Shulman
    • CommentTimeNov 19th 2013

    I think that the next time I teach calculus, I’m going to bring in differentials and linear approximations much earlier. This time I waited to use them as an explanation for the chain rule, but then once we had them I found that I liked using them for everything else. (Thanks Toby for stressing this point, here and elsewhere!) From that point of view, the ordinary differential is determined by a linear approximation

    f(x+dx)f(x)+d(f(x)) f(x+dx) \simeq f(x) + d(f(x))

    to first order in dx, so an appropriate meaning of “second differential” would be a quadratic approximation

    f(x+dx)f(x)+d(f(x))+12!d 2(f(x)) f(x+dx) \simeq f(x) + d(f(x)) + \frac{1}{2!} d^2(f(x))

    to second order in dx. And with this meaning of d 2(f(x))d^2(f(x)) we do have a literal quotient

    f(x)=d 2(f(x))dx 2.f''(x) = \frac{d^2(f(x))}{dx^2}.

    I haven’t yet worked out the best way to explain “first order in dxdx”, though. And I don’t know what I would say to explain the 2!2!, either.

    • CommentRowNumber30.
    • CommentAuthorTobyBartels
    • CommentTimeNov 20th 2013

    Zhen Lin #27 has succinctly pointed out the problems.

    Here is a more explicit problem with the usual notation for second derivatives: it breaks the chain rule. If u=f(x)u = f(x) and y=g(u)y = g(u), then

    dy=g(u)du=g(u)(f(x)dx)=g(f(x))f(x)dx \mathrm{d}y = g'(u) \,\mathrm{d}u = g'(u) \,(f'(x) \,\mathrm{d}x) = g'(f(x)) \,f'(x) \,\mathrm{d}x

    works out fine; it gives

    (gf)(x)=dy/dx=g(f(x))f(x) (g \circ f)'(x) = \mathrm{d}y/\mathrm{d}x = g'(f(x)) \,f'(x)

    as it should. But (using the proposed second differential from Mike #29, which is implicitly endorsed by the notation d 2y/dx 2\mathrm{d}^2{y}/\mathrm{d}x^2)

    d 2y=g(u)du 2=g(u)(f(x)dx) 2=g(f(x))f(x) 2dx 2 \mathrm{d}^2{y} = g''(u) \,\mathrm{d}u^2 = g''(u) \,(f'(x) \,\mathrm{d}x)^2 = g''(f(x)) \,f'(x)^2 \,\mathrm{d}x^2

    is no good; it gives

    (gf)(x)=d 2y/dx 2=g(f(x))f(x) 2, (g \circ f)''(x) = \mathrm{d}^2{y}/\mathrm{d}x^2 = g''(f(x)) \,f'(x)^2 ,

    which is incorrect. The correct formula is

    (gf)(x)=g(f(x))f(x) 2+g(f(x))f(x). (g \circ f)''(x) = g''(f(x)) \,f'(x)^2 + g'(f(x)) \,f''(x) .

    For this reason I never write d 2y/dx 2\mathrm{d}^2{y}/\mathrm{d}x^2 in class (except once, about the same time that I write yy', to warn against it); I write (d/dx) 2y(\mathrm{d}/\mathrm{d}x)^2{y} (or even d(dy/dx)/dx\mathrm{d}(\mathrm{d}y/\mathrm{d}x)/\mathrm{d}x, when it comes naturally) instead.

    More simply (but you might not believe it if I make it so simple right away), d 2u/du 2=0\mathrm{d}^2{u}/\mathrm{d}u^2 = 0 implies d 2u=0\mathrm{d}^2{u} = 0, which implies d 2u/dx 2=0\mathrm{d}^2{u}/\mathrm{d}x^2 = 0, and that's not what we want at all.

    • CommentRowNumber31.
    • CommentAuthorTobyBartels
    • CommentTimeNov 20th 2013

    @ Michael #24:

    Why is it important to make a conceptual distinction between functions with unspecified domains (like x,t,rx,t,r) and functions with domains subset of n\mathbb{R}^n like ff, and use different notation for their compositions?

    Partly because function notation is so convenient, yet it requires a domain, and sometimes we don't want to specify it (or even to imply that it's a subset of n\mathbb{R}^n, much less a subset of \mathbb{R} in first-term Calculus). So I want to write f(3)f(3) and f(3)f'(3); of course, I can also write y| x=3{y|_{x=3}} and (dy/dx)| x=3{(\mathrm{d}y/\mathrm{d}x)|_{x=3}} … but I can also write f(x+1)f(x+1), while I can't write y| x=x+1{y|_{x=x+1}}. I guess that this is basically what you said, the ability to compose functions.

    The distinction is especially relevant in applications. Here, the domain of the variables is a vaguely unspecified space of states of the world. The textbooks sometimes encourage us to identify this space (often it can be identified with the time line, for example), so that there is a single independent variable (or a few in the multivariable case) of which every other variable is a function. But the whole point of the Chain Rule is that the independent variable is irrelevant! It's sufficient that some choice of independent variables is possible (that is that some space of states can be assumed to exist), but it's completely unnecessary to actually make this choice. So I don't actually want to say that x,y,tx, y, t etc are functions at all (to the students, for whom a function is defined on a subset of some n\mathbb{R}^n). Yet functions like (xe x)(x \mapsto \mathrm{e}^x) are also around, and I want to refer to them from time to time too.

    How do you bring the point across to students that sometimes xx stands for the value of a function (or constant, as in your first paragraph) and sometimes xx represents a variable quantity (morphism in a category) as in your paragraphs 3 and 4? Especially if we use the same symbol.

    What's happening at the most basic level is a change of context (in the technical sense as in type theory). There is the general context where the variables are allowed to vary as much as they may, and then there is the more specific context where xx is set to (say) 33. (There are other intermediate contexts, especially in multivariable Calculus, such as that given by a constraint as in optimization problems.) Assuming for the sake of argument (but this is hardly necessary) that there is exactly one possible state of the world in which x=3x = 3, then we have a morphism from the point to the space of all world-states; as you said (but I didn't quote), we are taking a pullback along this morphism and abusing notation by keeping the same symbol xx.

    I tell my students, particularly when working out word problems, to keep careful track of the context. (I use the word ‘context’ but don't let them suspect that it's a technical term in logic!) In a typical problem, they have an equation that holds always, which they may differentiate; but then later they use equations that are only true for an instant. (Related rates and optimization are two broad categories of problems like this.) I tell them that any result from the equations that hold always is also true for an instant, but not conversely (which I think makes intuitive sense); and you cannot differentiate equations that only hold for an instant, because nothing is changing in that instant! (In multivariable Calculus you can differentiate equations that hold under a constraint, and this leads for example to Lagrange multipliers, but you still have to remember that you are working relative to a constraint.) So basically, I'm allowing them to pull back results along a morphism but not push them forward; but I try to make it sound like common sense instead of a theorem of categorial logic!

    • CommentRowNumber32.
    • CommentAuthorTobyBartels
    • CommentTimeNov 20th 2013

    the partial differential operator x i\frac{\partial}{\partial x^i} depends on the choice of the other coordinates as well, despite the notation!

    In a thermodynamics course, I learnt the notation (U/S) T(\partial{U}/\partial{S})_T for the partial derivative of UU with respect to SS when TT is held constant. (In general, there are n1n - 1 subscripts.) In my multivariable class, I introduce this notation first, then say that we can drop the subscripts as an abuse of notation when it's obvious what they're going to be. Of course, all of the abuses of notation can be justified in this way (that it's obvious what it's supposed to mean), but only this one is really necessary, since otherwise it gets very tedious.

    By the way, Mike, the corresponding notation for the partial derivatives of a function is f if_i (where i=1,2,...i = 1, 2, ...); it works just like f yf_y in hilbertthm90 #2, only it's legitimate. An alternative is D if\mathrm{D}_i{f} (especially if you want to leave subscripts free for a sequence or other family of functions). This extends to higher-order derivatives just fine, without having to assume commutativity (the claim that D i,j=D j,i\mathrm{D}_{i,j} = \mathrm{D}_{j,i}).

    • CommentRowNumber33.
    • CommentAuthorMike Shulman
    • CommentTimeNov 20th 2013

    @Toby #30: very interesting, thanks! How about this for a second try at “second differentials”?

    The general form of a second-order approximation should include not only a first-order change in the variable but also an independent second-order change. That is, let dx\mathrm{d}x be a first-order infinitesimal and d 2x\mathrm{d}^2x a second-order one, and work up to second-order; thus (dx) 2(\mathrm{d}x)^2 is relevant but (dx) 3(\mathrm{d}x)^3 and (d 2x) 2(\mathrm{d}^2x)^2 and (dx)(d 2x)(\mathrm{d}x)(\mathrm{d}^2x) can be neglected as being third- or fourth-order (or are equal to zero, depending on your preferred flavor of infinitesimal). Thus while it is true that

    f(x+dx)=f(x)+f(x)dx+12f(x)(dx) 2 f(x+\mathrm{d}x) = f(x) + f'(x)\, \mathrm{d}x + \frac{1}{2} f''(x)\, (\mathrm{d}x)^2

    we also have (by the same reasoning)

    f(x+dx+12d 2x)=f(x)+f(x)dx+12f(x)d 2x+12f(x)(dx) 2 f(x+\mathrm{d}x + \frac{1}{2} \mathrm{d}^2x) = f(x) + f'(x)\,\, \mathrm{d}x + \frac{1}{2} f'(x)\, \mathrm{d}^2x + \frac{1}{2} f''(x) \, (\mathrm{d}x)^2

    and it is both the third and the fourth terms here that should be called d 2(f(x))\mathrm{d}^2(f(x)), since they are both second-order. That is, we write

    f(x+dx+12d 2x)=f(x)+d(f(x))+12d 2(f(x)) f(x+\mathrm{d}x + \frac{1}{2} \mathrm{d}^2x) = f(x) + \mathrm{d}(f(x)) + \frac{1}{2} \mathrm{d}^2(f(x))

    and therefore

    d 2(f(x))=f(x)d 2x+f(x)(dx) 2. \mathrm{d}^2(f(x)) = f'(x)\, \mathrm{d}^2 x + f''(x)\, (\mathrm{d}x)^2.

    Now if u=f(x)u = f(x) and y=g(u)y = g(u), we have

    d 2y =g(u)d 2u+g(u)(du) 2 =g(f(x))(f(x)d 2x+f(x)(dx) 2)+g(f(x))(f(x)dx) 2 =g(f(x))f(x)d 2x+(g(f(x))f(x)+g(f(x))(f(x)) 2)(dx) 2 \begin{aligned} \mathrm{d}^2 y &= g'(u) \, \mathrm{d}^2 u + g''(u) \, (\mathrm{d}u)^2 \\ &= g'(f(x)) \Big(f'(x) \, \mathrm{d}^2 x + f''(x)\, (\mathrm{d}x)^2\Big) + g''(f(x)) (f'(x)\, \mathrm{d}x)^2\\ &= g'(f(x)) f'(x) \, \mathrm{d}^2 x + \Big(g'(f(x)) f''(x) + g''(f(x)) (f'(x))^2\Big) (\mathrm{d}x)^2 \end{aligned}

    And matching this with

    d 2((gf)(x))=(gf)(x)d 2x+(gf)(x)(dx) 2 \mathrm{d}^2((g\circ f)(x)) = (g\circ f)'(x)\, \mathrm{d}^2 x + (g\circ f) ''(x)\, (\mathrm{d}x)^2

    we recover the correct chain rules for both first and second derivatives.

    I suspect that this could be written in terms of jet bundles.

    Of course, now we’ve lost the notation d 2y(dx) 2\frac{\mathrm{d}^2y}{(\mathrm{d}x)^2} for the second derivative. Unless there’s some reason why we can assume d 2x=0\mathrm{d}^2x=0.

  5. @Toby: thanks for the detailed answer!

    Concerning Mikes nice reasoning in #33:

    Of course, now we’ve lost the notation d 2y(dx) 2\frac{d^2y}{(dx)^2} for the second derivative.

    If I understood this correctly, you might still “save” it by using the notation 2y(dx) 2\frac{\partial^2y}{(dx)^2}, or should it be 2y(x) 2\frac{\partial^2y}{(\partial x)^2}?

    • CommentRowNumber35.
    • CommentAuthorMike Shulman
    • CommentTimeNov 20th 2013

    @Michael: I didn’t use the notation 2y\partial^2 y; what are you thinking that it would mean? The problem that I saw is that d 2y\mathrm{d}^2 y is not a linear function of (dx) 2(\mathrm{d}x)^2 alone, but also of d 2x\mathrm{d}^2x.

    • CommentRowNumber36.
    • CommentAuthorMike Shulman
    • CommentTimeNov 20th 2013

    @Toby, I have a couple of terminological questions for you.

    1) What do you call “x 2+1x^2+1”? In Lawvere’s parlance it is still a “variable quantity”, but as it is not syntactically a variable I wouldn’t want to call it that. But neither is it a “function” in your setup, if I understood correctly.

    2) What do you call an object like “(x 2+1)dx(x^2+1)\;\mathrm{d}x” which you can take the integral of?

    • CommentRowNumber37.
    • CommentAuthorTobyBartels
    • CommentTimeNov 20th 2013
    • (edited Nov 20th 2013)

    d 2(f(x))=f(x)d 2x+f(x)(dx) 2\mathrm{d}^2(f(x)) = f'(x)\, \mathrm{d}^2 x + f''(x)\, (\mathrm{d}x)^2.

    I agree; I first got this by writing df(x)=f(x)dx\mathrm{d}f(x) = f'(x) \,\mathrm{d}x and applying the product rule. Notice that the d\mathrm{d} here is not the exterior derivative but instead a commutative (rather than supercommutative) operator.

    Of course, now we’ve lost the notation d 2y(dx) 2\frac{\mathrm{d}^2y}{(\mathrm{d}x)^2} for the second derivative.

    Right, and I don't know how to rehabilitate that notation, because of my last remark in #30.

    However, you can write 2y(x) 2\frac{\partial^2{y}}{(\partial{x})^2} instead! This is because, just as yx\frac{\partial{y}}{\partial{x}} is the coefficient on dx\mathrm{d}x in an expansion of dy\mathrm{d}y, so 2y(x) 2\frac{\partial^2{y}}{(\partial{x})^2} is the coefficient on (dx) 2(\mathrm{d}x)^2 in an expansion of d 2y\mathrm{d}^2y. (ETA: Michael already noticed this in #34, but hopefully my explanation of it helps.)

    By the way, I also tell my students that d\mathrm{d} binds more tightly than any non-differential operation like squaring, so I can write dx 2\mathrm{d}x^2 instead of (dx) 2(\mathrm{d}x)^2. (Then I always use parentheses in something like d(x 2)=2xdx\mathrm{d}(x^2) = 2x \,\mathrm{d}x.)

    • CommentRowNumber38.
    • CommentAuthorTobyBartels
    • CommentTimeNov 20th 2013
    • (edited Nov 20th 2013)

    Answering Mike's questions in #36:

    1. I call x 2+1x^2 + 1 simply a quantity. I might even call it a real number (but not a constant one) or even simply a number, but ‘quantity’ usually works well (except in applications to supply and demand, where ‘quantity’ has a more specific meaning). It is a variable quantity, of course, and I might even point out that it varies or may even say ‘x 2+1x^2 + 1 is variable.’, but not ‘x 2+1x^2 + 1 is a variable.’; that would be confusing. (In other words, when describing the quantity as a whole, I would use ‘variable’ only as an adjective.) Also, I will say that x 2+1x^2 + 1 is a function of xx,1 which simply means that there exists a function ff such that x 2+1=f(x)x^2 + 1 = f(x). If I'm emphasizing the logical form, then I may also call it an algebraic expression; technically, the expression ‘x 2+1x^2 + 1represents or stands for the quantity x 2+1x^2 + 1.

    2. Sometimes I call (x 2+1)dx(x^2 + 1) \,\mathrm{d}x an infinitesimal quantity, if I want to emphasize its interpretation as something infinitely small (and similarly I might call x 2+1x^2 + 1 a finitesimal quantity). But if I'm talking about things that one can integrate, then I usually call it a differential form. I actually introduce that term fairly early, when I remark that the differential of any (finitesimal) expression (in any number of variables!) will be a differential form; I point out that every term has a differential as one factor, note that this makes every term (and hence the sum) an infinitesimal quantity, and then I introduce the name for expressions of this form. (I also remark that the differential form has rank 11 because only 11 factor of each term is a differential, but we don't have to say that since differential forms of higher rank are only used in multivariable Calculus. And then in my multivariable class, I use them!)


    1. In case your browser fails to render <ins> in combination with MathML (as mine does), the ‘xx’ should also be underlined (assuming that your browser renders <ins> as underlining, which they nearly all do). This is just for emphasis. 

    • CommentRowNumber39.
    • CommentAuthorMichael_Bachtold
    • CommentTimeNov 20th 2013
    • (edited Nov 20th 2013)

    Toby in #37 wrote:

    I first got this by writing df(x)=f(x)dx\mathrm{d}f(x) = f'(x) \,\mathrm{d}x and applying the product rule. Notice that the d\mathrm{d} here is not the exterior derivative but instead a commutative (rather than supercommutative) operator.

    Interesting! Would you mind telling a bit more about this dd as a commutative operator and how the equation follows from the chain rule. Studying synthetic differential geometry is still on my TODO list, so apologies if this is standard knowledge among experts.

    I also still need to understand Mikes computation in #33. Intuitively I would have thought that a first order infinitesimal is also an infinitesimal of second order, so I find it confusing that we need to include the first order change seperately when looking at the effect of a second order change. (And I also have no intuition for what it means that the two changes are independent, from a geometric or physical perspective).

    Edit: I’m also still curious if Mikes suggestion from #26 can be made consistent with the different interpretations of dd suggested above

    So substituting 3 for xx doesn’t make dxdx into d(3)d(3). Rather, it just means that instead of dxdx being a small variation about xx, it is a small variation about 3.

    • CommentRowNumber40.
    • CommentAuthorMike Shulman
    • CommentTimeNov 21st 2013

    So would it then be correct to also write f(x)= 2y 2xf'(x) = \frac{\partial^2 y}{\partial^2 x}?

    Intuitively I would have thought that a first order infinitesimal is also an infinitesimal of second order

    Actually, it’s the other way around: a second order infinitesimal is also a first order one (although to first order, it’s zero). Higher order means a smaller number.

    • CommentRowNumber41.
    • CommentAuthorTobyBartels
    • CommentTimeNov 21st 2013

    I first got this by writing df(x)=f(x)dx\mathrm{d}f(x) = f'(x) \,\mathrm{d}x and applying the product rule. Notice that the d\mathrm{d} here is not the exterior derivative but instead a commutative (rather than supercommutative) operator.

    Interesting! Would you mind telling a bit more about this dd as a commutative operator and how the equation follows from the chain rule.

    I don't know very much about that operator; I asked about this stuff once on Math Overflow and got no clear answer, although I did get a reference that I haven't followed up yet. But if I just assume that it continues to obey the usual rules, then I can calculate with it just fine. In this case:

    d 2f(x)=d(df(x))=d(f(x)dx)=d(f(x))dx+f(x)d(dx)=(f(x)dx)dx+f(x)d 2x=f(x)dx 2+f(x)d 2x. \mathrm{d}^2f(x) = \mathrm{d}(\mathrm{d}f(x)) = \mathrm{d}(f'(x) \,\mathrm{d}x) = \mathrm{d}(f'(x)) \,\mathrm{d}x + f'(x) \,\mathrm{d}(\mathrm{d}x) = (f''(x) \,\mathrm{d}x) \,\mathrm{d}x + f'(x) \,\mathrm{d}^2x = f''(x) \,\mathrm{d}x^2 + f'(x) \,\mathrm{d}^2x .

    So would it then be correct to also write f(x)= 2y 2xf'(x) = \frac{\partial^2 y}{\partial^2 x}?

    Apparently so! But of course yx\frac{\partial y}{\partial x} is simpler.

  6. @Mike #40:

    Actually, it’s the other way around: a second order infinitesimal is also a first order one (although to first order, it’s zero). Higher order means a smaller number.

    Here’s how I was thinking: a first order infinitesimal number is one with ε 2=0\epsilon^2=0, a second order is one with ε 3=0\epsilon^3=0. So first order is also of second order. Actually when I picture infinitesimal neighborhoods of a point or subset in a manifold or whatever space, I always thought that they increase as the order increases. Did I get it wrong or are we maybe talking of dual things?

    @Toby #41: I recall that question on Mathoverflow (one of the comments was mine). Unfortunately I also have not had the time to follow up on the references. But I do find the infinitesimals and differentials approach advocated by Dray Manogue to be worthwile for teaching calculus. And since Mike arrived at the same equation as you by slightly different reasons it makes it even more compelling to believe that maybe there is still something to be understood in the interpretation of dd or d 2d^2.

    • CommentRowNumber43.
    • CommentAuthorMike Shulman
    • CommentTimeNov 21st 2013

    Yes, I think we are using language in dual ways. I’m thinking of nonstandard-analysis-style infinitesimals, whose square or cube is never actually equal to zero. Instead I’m saying, let’s fix some particular “scale” infinitesimal ε\epsilon; then a first-order infinitesimal is one η\eta such that η/ε\eta/\epsilon is finite (“limited”), a second-order one is such that η/ε 2\eta/\epsilon^2 is finite, etc.

    Then when we work “up to first order”, which we could formalize as being in the quotient ring of limited numbers modulo ε 2\epsilon^2, the square of a first-order infinitesimal can be neglected. And when we work “up to second order”, i.e. in the limited numbers modulo ε 3\epsilon^3, the cube of a first-order infinitesimal and the square of a second-order one can be neglected. So in the latter quotient ring, a first-order η\eta has η 3=0\eta^3=0 while a second-order one has η 2=0\eta^2=0.

    I think that this matches the use of phrases like “to first order” and “a first order change” in ordinary (non-infinitesimal) language better. A second order change is negligible if we are working to first order, but not if we are working to second order, yet the amount of the change itself is the same in both cases; what changes is our attitude towards it. But I guess it doesn’t apply as well to SDG-style nilpotent infinitesimals, so with those it may be better to avoid terms “first order” and “second order” and talk instead about “nilsquare” and “nilcube” etc.

  7. Mike thanks for the explanation! I’ll have to think about it some more to resolve the conflicting views in my head.

    In the meantime, here is something slightly related to the original question. In my calculus class last week I asked the students to answer the following questions

    Compute the derivative of:

    1. 2 xln(t 2+1)dt\int_2^x \ln(t^2+1)dt with respect to xx

    2. 2 xln(t 2+1)dt\int_2^x \ln(t^2+1)dt with respect to tt

    3. ln(t 2+1)dt\int \ln(t^2+1)dt with respect to tt

    (the last one is an indefinite integral, I’m using the notations of my calculus book here (Hughes-Hallett))

    That caused a lot of confusion for my students. My preliminary reaction is to think of the notation for the indefinite integral as the bad guy.

    • CommentRowNumber45.
    • CommentAuthorMike Shulman
    • CommentTimeNov 23rd 2013

    I wouldn’t ask my students (2) or (3). In fact, I’m not sure what you were expecting.

    In (2), are you assuming that xx is a function of tt? Or constant with respect to tt? Were you hoping that they would write ln(x 2+1)dxdtln(x^2+1) \frac{dx}{dt}? While there’s technically no contradiction in using the same variable both free and bound, it’s bad style even in published mathematical papers, so I wouldn’t want to inflict it on calculus students.

    As for (3), the way I’m used to thinking of it ln(t 2+1)dt\int ln(t^2+1)dt is not a function but a class of functions (differing by local constants) — hence not something you can take the derivative of.

    However, you do raise an important point, which is that tt is bound in a bf(t)dt\int_a^b f(t) dt but (sort of) free in f(t)dt\int f(t) dt. I’m curious to hear Toby’s take. I introduced indefinite integrals to my class last week by saying that the indefinite integral of a thing (I didn’t say “differential form”, but I might have as Toby suggested) is the most general expression whose differential is that thing. That made perfect sense to me.

    I haven’t done definite integrals yet, but from that point of view, maybe the problem is with the definite integral notation, since we have the limits aa and bb specified but without indicating in the notation which variable is supposed to take on those values. For instance, in a chain rule / substitution problem, say we have 1 22tcos(t 2)dt\int_1^2 2t \cos(t^2) dt, which we can solve by letting u=t 2u = t^2 so that du=2tdtdu = 2 t dt and

    2tcos(t 2)dt=cos(u)du.2t \cos(t^2) dt = \cos(u) du.

    But this equality (of differential forms) is not something to which we can apply the “operation” 1 2\int_1^2 and get

    1 22tcos(t 2)dt= 1 2cos(u)du.\int_1^2 2t \cos(t^2) dt = \int_1^2 \cos(u) du.

    Instead we have to put t=1t=1 and t=2t=2 into u=t 2u=t^2 and get

    1 22tcos(t 2)dt= 1 4cos(u)du.\int_1^2 2t \cos(t^2) dt = \int_1^4 \cos(u) du.

    So maybe it would be better to write t=1 22tcos(t 2)dt\int_{t=1}^2 2t \cos(t^2) dt (as we do with summation notation, t=1 4\sum_{t=1}^4) so that we could have

    t=1 22tcos(t 2)dt= t=1 2cos(u)du= u=1 4cos(u)du.\int_{t=1}^2 2t \cos(t^2) dt = \int_{t=1}^2 \cos(u) du = \int_{u=1}^4 \cos(u) du.
  8. In (2), are you assuming that xx is a function of tt? Or constant with respect to tt?

    Good point. But before I answers (and let me know me if you see this differently): the discussion so far showed that there are at least two popular interpretations for “variables” in calculus: one in the sense of “dummy variables” or placeholders for numbers, and one in the sense of “variable quantity” or maybe morphism in a suitable category. It also seems that these two interpretations can lead to conflicts. But I’d be glad to understand this better still.

    Having said that:

    For 2) I was expecting that they answer 00. From the “dummy variable” perspective xx is a placeholder for a number (representing the upper boundary and otherwise not related to tt), and since the variable tt is bound the whole integral does not “change” when we plug in different values for tt, so the derivative is zero.

    From the “variable quantity” perspective xx might depend on tt so the correct answer would be as you suggest ln(x 2+1)dxdtln(x^2+1) \frac{dx}{dt}. So to be consistent with the previous we need to assume xx is constant with respect to tt.

    But some things are not yet clear to me about this last answer. I’ll come to it in a moment.

    As for 3) I was expecting that they answer ln(t 2+1)\ln(t^2+1). I also think of the indefinite integral as a family of functions (depending on the same variable tt as the differential form). I guess I’m using the convention here that taking the derivative of a family of functions means taking the derivative of each member of the family. Of course in principle the additive constant could still depend on tt in some context, which makes things more subtle.

    …maybe the problem is with the definite integral notation, since we have the limits aa and bb specified but without indicating in the notation which variable is supposed to take on those values.

    I think the standard convention here is that the boundaries a,ba,b of the definite integral always refer to the variable appearing in dxdx (or dtdt etc.) so there is seldom ambiguity there. But as you suggest I also emphasize this by writing u=1 4cos(u)du\int_{u=1}^4 \cos(u)du instead of 1 4cos(u)du\int_{1}^4 \cos(u)du. In fact I sometimes overemphasize by writing u=1 u=4cos(u)du\int_{u=1}^{u=4} \cos(u)du, which brings me back to 2).

    If I had written t=2 t=xln(t 2+1)dt\int_{t=2}^{t=x} \ln(t^2+1)dt and interpret variables as “variable quantities”, then how should I interpret the equality t=xt=x appearing in the upper boundary? Does it mean that tt and xx are the same variable quantities? In that case the answer to 2) would be the same as the answer to 3) and it wouldn’t be possible to ask if xx is constant with respect to tt (also a student might object that it is unnecessary to introduce a new name xx to denote the same thing as tt, which nevertheless is considered bad style as you mention). But I suspect that the thing going on here and elsewhere in the “variable quantity” perspective is that an equality like t=xt=x is interpreted in a way more commonly seen in probability/statistics as in {x=t}\{x=t \} denoting “the set of all states where the random variables xx and tt assume the same value.”

    This raises some (maybe sidetracking) questions for me:

    1. if the “variable quantity” perspective can be formalized via arrows in a suitable category, then how does one formalize categorically the notion of two quantities (arrows) being “independent” or “constant” with respect to each other?

    2. What does the “set of states of the world” (also mentioned by Toby) correspond to categorically? Some classifying object?

    These questions are not directly addressed at Mike or Toby, but if you happen to know some answers I won’t complain. :) Apologies if I can’t respond in the next few days.

    • CommentRowNumber47.
    • CommentAuthorMike Shulman
    • CommentTimeNov 24th 2013

    Yes, I’m perfectly aware of the standard convention, and I agree that in practice there is no ambiguity in the meaning of a particular definite integral expression, but I described a situation (integration by substitution) in which the lack of notation could be problematic for a student when manipulating several such expressions.

    As for the meaning of t=xt=x, I think more generally one of the things we can do with a “variable quantity” is to let it be equal a particular other quantity. If the other quantity is constant, then it “stops varying” and becomes constant, while if the other quantity is also variable then their variation becomes dependent. For instance, when xx and yy are variable quantities and we write dydx| x=2\left.\frac{dy}{dx}\right|_{x=2} for what if y=f(x)y=f(x) we might also write as f(2)f'(2).

    Categorically, variable quantities are morphisms from some domain object say Γ\Gamma — which I think is what Toby meant by the space of “states of the world” — and setting two such variable quantities equal would correspond to restricting the domain to the equalizer of those two morphisms. That’s probably the same as what you mean by {x=t}\{x=t\}?

    I think this “fixing the value of a variable quantity” is the same thing that’s happening in a definite integral. Given a differential form like ln(t 2+1)dt\ln(t^2+1)dt involving a variable quantity tt and its differential dtdt, we can integrate this form from one particular value of tt to another. These particular values might be constant quantities or other variable quantities (such as variables), and in the latter case the result is again going to be variable.

    I need to think a bit about your first question.

    • CommentRowNumber48.
    • CommentAuthorMike Shulman
    • CommentTimeNov 24th 2013

    I know what it means for a variable quantity ΓR\Gamma \to R to be constant: it means that it factors through 11. I’m not sure about “constant with respect to” some other quantity, though. Maybe that is one of those things which only makes sense if the quantity “with respect to” is part of a given basis, so that we can say that the corresponding partial derivative vanishes?

    • CommentRowNumber49.
    • CommentAuthorTobyBartels
    • CommentTimeNov 25th 2013

    In the context of indefinite integrals, I read ‘\int’ as ‘antidifferential’, since ω\omega is the differential of ω\int \omega; that is, ω\int \omega is an antidifferential of ω\omega. (Of course, when ω=ydx\omega = y \,\mathrm{d}x, the derivative of ydx\int y \,\mathrm{d}x with respect to xx is yy, so ydx\int y \,\mathrm{d}x is also an antiderivative of yy with respect to xx, like they say in the book.) I tend to avoid the term ‘indefinite integral’; it’s bad enough that (almost) the same notation is used for two different concepts (definite and indefinite integrals), and I'd just as soon not use (almost) the same terminology as well.

    I've never liked the idea that the antidifferential of a differential form (or whatever you want to call that) is a set of quantities; I try to say ‘an’ instead of ‘the’ as much as possible. If you just want one antidifferential, then (for example) x 2dx=x 3/3\int x^2 \,\mathrm{d}x = x^3/3 is OK; but if you want all of them, then you need x 2dx=x 3/3+C\int x^2 \,\mathrm{d}x = x^3/3 + C. (I enforce the book's answers to its problems by saying that it's asking for all of them. And I say that it's only interested in quantities defined on a connected domain, so I don't have to deal with local constants.)

    For definite integrals, I introduce the notation first as p qω\int_p^q \omega, where pp and qq are equations (preferably with unique solutions). Then x=a bω\int_{x = a}^b \omega is an abbreviation for x=a x=bω\int_{x = a}^{x = b} \omega (as Michael wrote) when the left-hand sides are the same; finally, a bydx\int_a^b y \,\mathrm{d}x is an abbreviation for x=a bydx\int_{x=a}^b y \,\mathrm{d}x when only one variable's differential appears in the expression for ω\omega. That's mostly how they look, but I encourage them to use a longer form when doing integration by substitution, for the reasons that Mike gives. (This can violate the requirement that pp and qq have unique solutions. It's sufficient that the result of the integral be the same for any choice of solution, or at least for any choice where the solutions of pp and qq are connected.)

    The Fundamental Theorem of Calculus has two parts, which are inconsistently numbered. By the numbering in our textbook (which is the way that I learnt it):

    1. d( p qω)=ω| p q\mathrm{d}(\int_p^q \omega) = {\omega|_p^q},
    2. p qdu=u| p q\int_p^q \mathrm{d}u = {u|_p^q}.

    Since this a theorem and needs fine print (about things being continuous and the like), I state and prove these first in function notation like the book does, but I bring up these forms eventually.

    The variable tt is definitely free in both df(t)/dt\mathrm{d}f(t)/\mathrm{d}t and f(t)dt\int f(t) \,\mathrm{d}t, no ‘sort of’ about it. It's bound in f(t)| t=a=f(a){f(t)|_{t=a}} = f(a), in f(t)| t=a b=f(b)f(a){f(t)|_{t=a}^b} = f(b) - f(a), and in t=a bf(t)dt\int_{t=a}^b f(t) \,\mathrm{d}t (which has a more complicated tt-free definition). I agree with Mike that tt is bound in 2 xln(t 2+1)dt\int_2^x \ln(t^2+1)dt, so you can't really differentiate it with respect to tt, although you could naïvely say that it's ln(x 2+1)dx/dt\ln(x^2+1)dx/dt as Mike suggested. On the other hand, ln(t 2+1)dt\int \ln(t^2+1)dt is fine; by definition, its differential is ln(t 2+1)dt\ln(t^2+1)dt, so its derivative with respect to tt is ln(t 2+1)\ln(t^2+1). (You don't even need the FTC for this one.)

    • CommentRowNumber50.
    • CommentAuthorTobyBartels
    • CommentTimeNov 25th 2013

    there are at least two popular interpretations for “variables” in calculus: one in the sense of “dummy variables” or placeholders for numbers, and one in the sense of “variable quantity” or maybe morphism in a suitable category

    These two senses can both be incorporated into categorial logic. In the case of an expression like x 2| x=1 2{x^2|_{x=1}^2} (which is an abbreviation of x 2| x=1 x=2{x^2|_{x=1}^{x=2}} and is usually further abbreviated as x 2| 1 2{x^2|_1^2}), we start with a real-valued quantity xx in some context Γ\Gamma (formally a morphism x:Γx\colon \Gamma \to \mathbb{R}). The equations x=1x = 1 and x=2x = 2 specify certain extensions of Γ\Gamma, categorially constructed as equalizers (as Mike suggested). Call these extensions Γ| x=1{\Gamma|_{x=1}} and Γ| x=2{\Gamma|_{x=2}} respectively; then if uu is any real-valued quantity in the context Γ\Gamma, u| x=1 2{u|_{x=1}^2} is a real-valued quantity whose context is the product Γ| x=1×Γ| x=2{\Gamma|_{x=1}} \times {\Gamma|_{x=2}}. (You should be able to draw this using arrow-theoretic diagrams, making use of the subtraction operation ×\mathbb{R} \times \mathbb{R} \to \mathbb{R}.) If it should so happen that Γ| x=1{\Gamma|_{x=1}} and Γ| x=2{\Gamma|_{x=2}} are points (terminal objects), then u| x=1 2{u|_{x=1}^2} is simply a real number.

    In the case of x 2| x=1 2{x^2|_{x=1}^2}, if this appears as a problem in a textbook without any further context, the default interpretation is supposed to be that Γ\Gamma is the largest subset of \mathbb{R} on which (xx 2)(x \mapsto x^2) is defined, in this case the entire real line \mathbb{R}; then Γ| x=1{\Gamma|_{x=1}} and Γ| x=2{\Gamma|_{x=2}} are indeed points, and so x 2| x=1 2{x^2|_{x=1}^2} is indeed a real number (as it happens, 33). In the context of a word problem where xx stands for an inherently positive quantity, then it would be more appropriate to take Γ\Gamma to be ]0,[{]0,\infty[} instead.1 But in such problems, I think it even more natural to take Γ\Gamma to be an abstract space, which I think of as the space of possible states of the situation described in the problem. While Γ\Gamma might never be fully defined, various properties of it may be justified as needed on the basis of the intuition behind the problem. The textbooks, by encouraging us to put everything in the problem in terms of a single variable (such as xx), effectively ask us to find that this variable mediates an isomorphism between Γ\Gamma and some subspace of \mathbb{R} (such as ]0,[{]0,\infty[}); this specifies Γ\Gamma up to specified isomorphism, so no further intuition is needed. But many problems are easier to solve without expressing everything in terms of one variable, and I encourage my students to take a more flexible approach (especially to things like related rates and optimization problems). This just requires them to be a little more careful about keeping track of the context.

    I suspect that the thing going on here and elsewhere in the “variable quantity” perspective is that an equality like t=xt=x is interpreted in a way more commonly seen in probability/statistics as in {x=t}\{x=t \} denoting “the set of all states where the random variables xx and tt assume the same value.”

    Yes, precisely, and this is an equalizer. In general, I'd say that the probability/statistics people have a good handle on this stuff; they know what a random variable really is, after all, and the rest of us just need to learn that all of our variables are much the same sort of thing.

    how does one formalize categorically the notion of two quantities (arrows) being “independent” or “constant” with respect to each other?

    Like Mike, I don't think that this is really a sensible notion without specifying what the other independent variables are supposed to be. Rather, what should be formalized is the idea that one quantity is determined by another. Working in the context Γ\Gamma, a TT-valued quantity xx is determined by a UU-valued quantity yy if there exists a morphism f:UTf\colon U \to T such that x=fyx = f \circ y. (This definition appears as one of the fundamental concepts in Lawvere & Schanuel's Conceptual Mathematics.)

    What does the “set of states of the world” (also mentioned by Toby) correspond to categorically? Some classifying object?

    Sure, although actually it's a coclassifying object. So, while a principal GG-bundle on SS (for GG some topological group and SS some topological space) is the same as a continuous map from SS to the classifying space BGB G, so an SS-valued smooth quantity (for SS some smooth space) in a given context Γ\Gamma is the same as a smooth map to SS from a coclassifying space (which I've been calling simply Γ\Gamma again). So Γ\Gamma is the coclassifying space for the quantities in the problem.


    1. Of course, I write this as (0,)(0,\infty) in class, in deference to the textbooks, but I prefer the less overloaded notation ]0,[{]0,\infty[}

    • CommentRowNumber51.
    • CommentAuthorTobyBartels
    • CommentTimeNov 25th 2013

    Lawvere & Schanuel's Conceptual Mathematics

    One of my Calculus students came upon this very thread the other day and asked for reading material that would give him some idea of what we were talking about, and I recommended Lawvere & Schanuel. In my opinion, a course using this book should be the first college-level math course that every student takes. Algebra is a prerequisite for it, but not Calculus, so it should come before Calculus. (A bonus is that the practice of requiring Calculus as a prerequisite for unrelated courses such as linear algebra or discrete mathematics, intended to guarantee a level of mathematical maturity, would be served by requiring the course in conceptual mathematics, which is more important to know anyway.)

    Of course, first the math teachers have to learn this stuff!

    • CommentRowNumber52.
    • CommentAuthorMike Shulman
    • CommentTimeNov 25th 2013

    Thanks Toby! How do you define the general form a bω\int_a^b \omega with aa and bb equations?

    I’m also curious whether you’ve ever tried teaching a course out of Lawvere & Schanuel?

    • CommentRowNumber53.
    • CommentAuthorMike Shulman
    • CommentTimeNov 26th 2013

    Another question for you, Toby, though not closely related to the subject of this thread. In emphasizing differentials more this semester than before, I’ve found that a lot of my students mix up derivatives and differentials. E.g. they will write things like f(x)=2xdxf'(x) = 2x \, dx. Do you have any tricks for alleviating or preventing this confusion?

    • CommentRowNumber54.
    • CommentAuthorMike Shulman
    • CommentTimeNov 26th 2013

    I just noticed that Sage’s calculus functions use a notion of symbolic variable which seems quite similar to the “variable quantities” under discussion here. The documentation’s description of them as “elements of the symbolic expression ring” suggests that they have a different mathematical formalization in mind, although I haven’t figured out exactly what that means. But their behavior seems quite similar to what we’ve been talking about, e.g. once you declare a symbolic variable xx, you can then write y=x 2+1y = x^2+1 and differentiate yy with respect to xx:

    var('x')
    y = x^2+1
    y.derivative(x)
    

    gives 2x2x. Although it will also try to guess the variable to differentiate with respect to if you don’t give it one:

    y.derivative()
    

    also gives 2x2x. Sage also seems to assume that all variables are constant with respect to each other:

    var('t')
    y = x^2 + t^2
    y.derivative(x)
    

    also gives 2x2x. Although you can declare one “variable” to be instead a function of the other:

    t = function('t',x)
    w = x^2 + t^2
    w.derivative(x)
    

    uses the chain rule to give 2*t(x)*D[0](t)(x) + 2*x. Finally, a symbolic expression like these yys can’t be evaluated like a function — or at least trying to do so

    y(3)
    

    gives a DeprecationWarning. But you can make it into a “callable symbolic expression” by designating an order of the variables occurring in it:

    z = y.function(x,t)
    z(3,8)
    

    I wonder if this would be a good sort of convention to adopt in a calculus class as well, especially one that involves learning to use Sage.

    • CommentRowNumber55.
    • CommentAuthorTobyBartels
    • CommentTimeNov 27th 2013

    How do you define the general form a bω\int_a^b \omega with aa and bb equations?

    Now I feel like I ought to think about pulling ω\omega back to the solution subspace of those equations, but I really only define it for equations with unique solutions on a simply-connected 11-dimensional domain, that is expressions that can be reduced to x=a bf(x)dx\int_{x=a}^b f(x) \,\mathrm{d}x, which I define (following the textbook) as a Riemann integral (although sometimes I feel like I ought to do a Henstock integral). This is an approach that already does not generalize to complex variables, of course; in the multivariable class, I talk about oriented curves and all that.

    • CommentRowNumber56.
    • CommentAuthorTobyBartels
    • CommentTimeNov 27th 2013
    • (edited Nov 27th 2013)

    Do you have any tricks for alleviating or preventing this confusion?

    Not ones that work!

    Mind you, there are plenty of analogous mistakes without differentials. My goal is that they only make mistakes like this that don't make their final answer wrong.

    ETA: So for example, if they put in too many differentials, then they might write this:

    f(x)=ln(3x+1) f(x) = \ln(3x+1) f(x)=d(3x+1)3x+1 f'(x) = \frac{\mathrm{d}(3x+1)}{3x+1} f(x)=3dx+03x+1 f'(x) = \frac{3\,\mathrm{d}x+0}{3x+1} f(x)=33x+1; f'(x) = \frac3{3x+1} ;

    the middle lines are wrong, but the last is correct (given the first).

    But if they put in too few differentials, then they might write this:

    x 5+y 5=x+y x^5 + y^5 = x + y 5x 4+5y 4=1+y 5x^4 + 5y^4 = 1 + y' y=5x 4+5y 41; y' = 5x^4 + 5y^4 - 1 ;

    now everything is completely wrong (after the first line).

    The latter is a fairly standard Calculus-class error, which using differentials helps to avoid; I much prefer the former error.

  9. @Mike 54: nice idea to look at how people have implemented these things in software. Just a quick question for clarification:

    I wonder if this would be a good sort of convention to adopt in a calculus class as well, especially one that involves learning to use Sage.

    Do you mean the convention of distinguishing between “symbolic variables” and “callable symbolic expressions”?

    If yes, it looks to me (at first sight) that these two notions corresponds to our distinction between “variable quantities” (maps with unspecified domain) and “functions” with domains some subset of n\mathbb{R}^n. In the classical notation it might be the difference between writing f=x 2+1f=x^2+1 and f(x)=x 2+1f(x)=x^2+1. In the first case ff would be a variable quantity, in the second case ff is a function from \mathbb{R} to itself. Would you agree?

    • CommentRowNumber58.
    • CommentAuthorMike Shulman
    • CommentTimeNov 27th 2013

    Do you mean the convention of distinguishing between “symbolic variables” and “callable symbolic expressions”?

    I guess that’s mostly what I meant. As I said in #54, Sage’s “symbolic variables” do seem to correspond to our “variable quantities”, but I think its “callable symbolic expresions” are not quite the mathematician’s functions, because they still remember the names of their variables. E.g.

    f(x) = x^2
    

    (another way to define a callable symbolic expression)

    f(3)           ===>  9
    f(x=3)         ===>  9
    f(y=3)     ===>  x^2
    

    So I guess I was wondering whether it would be worth discussing with calculus students the idea of a “function that knows the name of its arguments”.

    • CommentRowNumber59.
    • CommentAuthorzskoda
    • CommentTimeNov 27th 2013
    • (edited Nov 27th 2013)

    Mike 43

    Instead I’m saying, let’s fix some particular “scale” infinitesimal ε\epsilon; then a first-order infinitesimal is one η\eta such that η/ε\eta/\epsilon is finite (“limited”), a second-order one is such that η/ε 2\eta/\epsilon^2 is finite, etc.

    Well, even more, in ultrafilter model, one looks at sequences with some limiting behaviour, and the integer power law in comparing asymptotic infinitesimals is not the only possibility. You can have exponentially small ones, e.g. such ratios that say ηε 3/2exp(1/ε 2)\frac{\eta}{\epsilon^{3/2} exp(-1/\epsilon^2)} is finite. I hope you agree. (Sorry for bringing an issue which is already aged in the thread).

    • CommentRowNumber60.
    • CommentAuthorMike Shulman
    • CommentTimeNov 27th 2013

    @Zoran: Yes, of course. That’s not even particular to an ultrafilter model, e.g. ε\sqrt{\epsilon} is still infinitesimal, but “less than first order”. But the integer power law is the relevant one for defining derivatives and higher derivatives.

    • CommentRowNumber61.
    • CommentAuthorzskoda
    • CommentTimeNov 27th 2013
    • (edited Nov 27th 2013)

    Surely, Mike, I was not considering the issue critical for your calculus discussion, but for the intuition/image people who know other approaches, primarily SDG, gain about the nonstandard analysis.

  10. @Toby #56: why would you say that there are too many differentials in the first computation? If we add one more dxdx (for example multiplying on left) it seems correct. To understand the confusion of students it would be interesting to understand what the student was thinking when doing that computation.

    • CommentRowNumber63.
    • CommentAuthorTobyBartels
    • CommentTimeNov 29th 2013

    Sure, too many on the right or too few on the left. I'm basically taking the left-hand side (the simpler one and the one first written down) as indicating what the student meant to do and judging correctness or incorrectness based on that. (But when correcting a paper, I might well amend the left-hand side instead, if that's the simpler fix.)

    • CommentRowNumber64.
    • CommentAuthorMike Shulman
    • CommentTimeNov 29th 2013

    It’s not clear to me that when students make mistakes like this they are thinking anything, in the sense that we would mean the word. Rather, they just don’t seem to have the same understanding we do that mathematical words and symbols have precise meanings and have to be used correctly.

    • CommentRowNumber65.
    • CommentAuthorTobyBartels
    • CommentTimeNov 30th 2013
    • (edited Dec 10th 2013)

    Yeah, I wouldn't want to defend the thesis that the left-hand side indicates what the student intended in any seriously discriminatory way; I mean, I wouldn't want to assume that the student is thinking clearly enough to discriminate between intending f(x)f'(x), intending f(x)dxf'(x) \,\mathrm{d}x, or intending df(x)\mathrm{d}f(x) (the latter two being equal, of course, but maybe not trivially so even to a student who is thinking clearly). I just mean that if I have to pick some way to classify the error (as too many differentials or as too few, in this case), then that's the criterion that I'll use.

    • CommentRowNumber66.
    • CommentAuthorTobyBartels
    • CommentTimeDec 10th 2013

    I’m also curious whether you’ve ever tried teaching a course out of Lawvere & Schanuel?

    No. It might not work very well for the students that we get either; it would need a massive illustrated, hand-holding, problem-filled expansion.

    • CommentRowNumber67.
    • CommentAuthorTobyBartels
    • CommentTimeFeb 4th 2014

    On second derivatives and second differentials … John Armstrong was considering them in 2009 in two posts that unfortunately attracted no comments.

    • CommentRowNumber68.
    • CommentAuthorMike Shulman
    • CommentTimeFeb 8th 2014

    Regarding antidifferentials (#44-49), what about introducing a new notation for “equality up to a local constant”? Since an equation like x 2dx=13x 3+C\int x^2 dx = \frac{1}{3} x^3 + C is not an “equation involving a variable xx” in the same sense as (x+1) 2=x 2+2x=1(x+1)^2 = x^2+2x=1 anyway (you can’t substitute x=3x=3 in itto get anything meaningful), it has to be regarded as an “equation between variable quantities”, and then we can change the sense of “equal” as well. Say that if uu and vv are variable quantities, then uvu\equiv v means that uu and vv have the same domain, and on every connected subset of that domain there is a constant CC such that u=v+Cu=v+C on that subset (or some simpler version of this statement that would be easier to understand). Then we could write

    x 2dx13x 3 \int x^2 dx \equiv \frac{1}{3} x^3

    and even

    1xdxln|x|. \int \frac{1}{x} dx \equiv \ln |x|.
    • CommentRowNumber69.
    • CommentAuthorMike Shulman
    • CommentTimeFeb 9th 2014

    Re #67: I remember stumbling over that issue sometime as an undergrad, or maybe even a grad student. I think I spent days, or at least hours, trying to figure out why some computation wasn’t working, before I realized that I was implicitly assuming a version of “Cauchy’s invariant rule” for second derivatives (though I didn’t know the name of it), and that it might not be true.

    From the perspective of #33 above, the problem arises from neglecting the d 2xd^2 x terms that ought to be there in the second differential. I certainly didn’t understand that at the time, but I might have if someone had taught me calculus using differentials to start with!

    • CommentRowNumber70.
    • CommentAuthorMike Shulman
    • CommentTimeFeb 9th 2014

    @Toby, did either of the two answers on your MO question ever pan out? The Hasse-Schmidt one seems promising, as you said, but as stated it seems to be purely algebraic and so only applies to polynomials. Also, if I understood it correctly, there isn’t an operator dd that could be applied to anything already containing dds – instead there is a separate d 2d^2 operator which is just asserted to satisfy the Leibniz rule that you would expect if it were actually “dd-of-dd”.

    • CommentRowNumber71.
    • CommentAuthorTobyBartels
    • CommentTimeFeb 9th 2014
    • (edited Feb 9th 2014)

    Re #68: Then an important basic result (an easy corollary of the Mean Value Theorem) is that (for differentiable quantities) uvu \equiv v is equivalent to du=dv\mathrm{d}u = \mathrm{d}v.

    Actually, I've considered formally defining d\mathrm{d} to be the operation taking uu to its {\equiv}-equivalence class. Then all of the hard work goes into defining multiplication of such an equivalence class by an ordinary quantity (or more precisely into defining the equality relation on formal linear combinations of differentials with coefficients from the ring of quantities). Note that naïvely, every quantity has a differential in this sense, but we'll find that things are better behaved when we restrict to differentiable quantities.

    • CommentRowNumber72.
    • CommentAuthorTobyBartels
    • CommentTimeFeb 9th 2014

    Re #69: I dare say that I spent years on this, off and on, struggling to figure out what the heck was going on. It may actually have only been when I was first assigned to teach Calculus that I forced myself to come to some resolution (and shortly thereafter started writing M.O questions about it). I remember struggling with the minus sign in dy/dx=(F/x)/(F/y)\mathrm{d}y/\mathrm{d}x = -(\partial{F}/\partial{x})/(\partial{F}/\partial{y}) around the same time (although I resolved that one much earlier).

    Re #70: No, I never really slogged through the linked articles. I've really just these past few months settled on my own answer. To wit: df\mathrm{d}f is the operation that maps a smooth curve cc to (fc)(0)(f \circ c)'(0); d 2f\mathrm{d}^2f maps cc to (fc)(0)(f \circ c)''(0), and so on. Of course, ff itself maps cc to (fc)(0)(f \circ c)(0). Then we just take the subring generated by the above, within the ring of all operations that map a curve to a number (which is commutative). At least for smooth functions, that's all that there is to it.

    • CommentRowNumber73.
    • CommentAuthorMike Shulman
    • CommentTimeFeb 9th 2014

    Is there a derivation dd that maps that entire subring to itself? It’s clear what it should do on the generators, of course, but it’s not immediately obvious to me that that yields a well-defined operation.

    Anyway, it sounds like a reasonable answer, but I find it a bit unsatisfying not to have a more intrinsic characterization of the subring in question, and also to have to assume in advance the notion of smooth.

    • CommentRowNumber74.
    • CommentAuthorTobyBartels
    • CommentTimeFeb 9th 2014

    I'm not sure what you mean by

    have to assume in advance the notion of smooth

    As far as the M.O question is concerned, we're working on a smooth manifold (in fact a Cartesian space, without loss of generality), so we have this notion. Even if then we try to make it work more generally for diffeological spaces or the like, then all of these still start out with some notion of smooth. (It's the other thread where we're trying to define everything in terms of curves in very general spaces; here we're still trying to understand n\mathbb{R}^n.)

    But if instead you mean that it's unsatisfying to only define this for smooth maps (so not to extend to the case where, say, d 2f\mathrm{d}^2 f exists but d 3f\mathrm{d}^3 f does not), then I think that it should still work, just with extra effort to keep track of when things might be undefined. (Again, we know ahead of time what's C kC^k and what's not, so we already know when d\mathrm{d} should be defined.)

    It’s clear what it should do on the generators, of course, but it’s not immediately obvious to me that that yields a well-defined operation.

    Ah, good point! Actually, I think that I can extend d\mathrm{d} (partially defined) to every operation whatsoever taking a smooth parametrized curve to a real number. Given the curve cc and a real number hh, let c hc_h be the reparametrization of cc given by tc(t+h)t \mapsto c(t + h). Then given the operation η\eta (so η|c\langle{\eta{|}c}\rangle is a number), define dη\mathrm{d}\eta so that

    dη|clim h0η|c hη|ch \langle{\mathrm{d}\eta{|}c}\rangle \coloneqq \lim_{h \to 0} \frac{\langle{\eta{|}c_h}\rangle - \langle{\eta{|}c}\rangle} h

    if this exists. (You can leave dη\mathrm{d}\eta as a partially defined operation, or declare that dη\mathrm{d}\eta exists only if this limit exists for all cc.)

    This manifestly depends only on the underlying operation, and it does the right thing, recursively, to smooth maps.

    • CommentRowNumber75.
    • CommentAuthorMike Shulman
    • CommentTimeFeb 10th 2014

    Very nice! You can exclude some uninteresting things by restricting to germs of curves, and I think you can even omit the a priori restriction to smooth curves: consider partial real-valued functions from the set of germs (at 0) of all curves, and say a curve cc is smooth if d nxd^n x is defined at cc for all coordinate functions xx. (I’m not sure exactly what I was complaining about re: “smooth”, but whatever it was, this makes me happier.) That feels kind of Froelicher: given the relation η|c\langle \eta | c \rangle between partial operations and curves, we consider the fixed point of the resulting Galois connection generated by the coordinate functions. The point in the other thread is that this doesn’t correctly isolate the differentiable functions on the other side: even if dfd f is defined, as an operation, on all smooth cc, then ff may not be differentiable in the usual sense unless dfd f additionally depends only on the tangent vector of a curve and is a linear function thereof. Right?

    Interestingly, I think this context also allows operations like e dxe^{dx}: it’s the operation that takes cc to e (xc)(0)e^{(x\circ c)'(0)}. And presumably its differential is d(e dx)=e dxd 2xd(e^{dx}) = e^{dx}\, d^2x. I’m not sure whether this is a good thing or not. I’m currently playing around with a different idea for defining higher differentials; if it works I may post up somewhere.

    Can you think of a good name for these things that include differentials and also higher ones? We can’t really call them “differential forms” once they have d 2x0d^2x\neq 0 and dxdy=dydxdx\,dy = dy\,dx.

    • CommentRowNumber76.
    • CommentAuthorMike Shulman
    • CommentTimeFeb 10th 2014

    (I guess I’m having trouble separating the threads, sorry – in my mind it’s all one discussion. (-: )

    • CommentRowNumber77.
    • CommentAuthorTobyBartels
    • CommentTimeFeb 10th 2014

    We certainly can call them differential forms even when dxdy=dydx\mathrm{d}x \,\mathrm{d}y = \mathrm{d}y \,\mathrm{d}x; they're just not exterior differential forms. The term ‘form’ is quite general and has a venerable history. (Compare ‘quadratic form’, ‘symmetric bilinear form’, etc.) In M.O, I said ‘cojet differential form’, which is not quite as nice a term as ‘exterior differential form’ (since ‘cojet’ is a noun rather than an adjective like ‘exterior’), but it does get at the right idea: that they act on spaces of jets (the limit of which is the space of germs, as you noted).

    • CommentRowNumber78.
    • CommentAuthorTobyBartels
    • CommentTimeFeb 11th 2014

    I like your e dx\mathrm{e}^{\mathrm{d}x}; I have successfully calculated d(e dx)=e dxd 2x\mathrm{d}(\mathrm{e}^{\mathrm{d}x}) = \mathrm{e}^{\mathrm{d}x} \,\mathrm{d}^2x (using Taylor's Theorem with Peano's remainder); actually, the calculation works for e ω\mathrm{e}^\omega generally.

    Generalizing still further, I conclude that

    d(f(ω 1,,ω n))=D 1f(ω 1,,ω n)dω 1++D nf(ω 1,,ω n)dω n \mathrm{d}(f(\omega_1, \ldots, \omega_n)) = D_1{f}(\omega_1, \ldots, \omega_n) \,\mathrm{d}\omega_1 + \cdots + D_n{f}(\omega_1, \ldots, \omega_n) \,\mathrm{d}\omega_n

    for any differentiable function ff of nn variables, by pushing everything through the definition, applying Taylor's Theorem to ff, and observing that the unwanted terms drop out in the limit. What more could one possibly want? (In particular, d\mathrm{d} is a derivation.)

    Technicality: You wrote in part

    say a curve cc is smooth if d nxd^n x is defined at cc for all coordinate functions xx

    You mean that cc is smooth at 00, or else you mean that d nx\mathrm{d}^n x must be defined at c hc_h for all xx and all real numbers hh.

    in my mind it’s all one discussion

    Certainly you borrowed notation from an off-site file linked only in the other thread!

    • CommentRowNumber79.
    • CommentAuthorMike Shulman
    • CommentTimeFeb 11th 2014

    You mean that cc is smooth at 00

    Yes, thanks.

    Certainly you borrowed notation from an off-site file linked only in the other thread!

    Really? What notation? You used η|c\langle{\eta{|}c}\rangle up in #74 here…

    spaces of jets (the limit of which is the space of germs

    Technicality again, but that doesn’t seem quite right to me; at least, I can’t see a sense in which it’s true. In particular, a germ is not determined by its kk-jets for k<k\lt\infty, is it?

    We certainly can call them differential forms

    Okay, I see the point that it’s historically fine, but my experience is that nowadays mathematicians pretty universally say “differential form” to mean “exterior differential form”. I guess “cojet differential form” would suffice to clarify, which might get abbreviated to “cojet form”.

    I think my main worry is using the same symbol dd for the cojet differential and the exterior differential. For instance, pedagogically speaking, if I teach my calc 1 or calc 2 students to calculate with cojet differentials, aren’t they going to be confused when they get to multivariable and I tell them that now d 2=0d^2=0?

    • CommentRowNumber80.
    • CommentAuthorMike Shulman
    • CommentTimeFeb 11th 2014

    I wonder whether cojet forms and exterior forms could be unified in a larger framework? In some sense, all these cojet forms are still only 1-forms: even though they involve higher derivatives, they only act on curves. But we could consider instead real-valued operators on germs of parametrized surfaces or hypersurfaces as well. For instance, if ω\omega is an operator on germs of curves, we could define its exterior differential d^ω\hat{d}\omega as an operator on germs of surfaces by

    d^ω|c=lim t0ω|λs.c(s,0)+ω|λs.c(t,s)ω|λs.c(s,t)ω|λs.c(0,s)t \langle \hat{d}\omega {|} c \rangle = \lim_{t\to 0} \frac{ \langle \omega {|} \lambda s.c(s,0) \rangle + \langle \omega {|} \lambda s.c(t,s) \rangle - \langle \omega {|} \lambda s.c(s,t) \rangle - \langle \omega {|} \lambda s.c(0,s) \rangle }{t}

    or perhaps in the case when ω\omega might be nonlinear it would be better to say

    d^ω|c=lim t0ω|λs.c(s,0)+ω|λs.c(t,s)+ω|λs.c(s,t)+ω|λs.c(0,s)t \langle \hat{d}\omega {|} c \rangle = \lim_{t\to 0} \frac{ \langle \omega {|} \lambda s.c(s,0) \rangle + \langle \omega {|} \lambda s.c(t,s) \rangle + \langle \omega {|} \lambda s.c(-s,t) \rangle + \langle \omega {|} \lambda s.c(0,-s) \rangle }{t}

    I haven’t checked that this is at all sensible. But it also starts (unsurprisingly) to make me think of the Weil algebras that define the infinitesimal objects in SDG.

    • CommentRowNumber81.
    • CommentAuthorMike Shulman
    • CommentTimeFeb 11th 2014

    Here’s another thought: can we integrate an arbitrary cojet form? Suppose ω\omega is a real-valued operator on germs of curves, and let cc be a curve defined on (aε,b+ε)(a-\epsilon,b+\epsilon). Then we have a function f:[a,b]f:[a,b]\to\mathbb{R} defined by

    f(x)=ω|c x f(x) = \langle \omega {|} c_{x} \rangle

    and we could define

    cω= a bf(x)dx \oint_c \omega = \int_{a}^b f(x) dx

    if the RHS exists. It seems like it ought to follow that

    cdω=ω|c bω|c a. \oint_c d\omega = \langle \omega {|} c_b \rangle - \langle \omega {|} c_a \rangle.

    (where dd is the commutative cojet differential). But it’s late at night, so I could be spewing nonsense…

    • CommentRowNumber82.
    • CommentAuthorTobyBartels
    • CommentTimeFeb 11th 2014

    You used η|c\langle{\eta{|}c}\rangle up in #74 here…

    Oops, never mind, that was me, not you!

    a germ is not determined by its kk-jets for k<k\lt\infty, is it?

    Ah, no, I must have been implicitly assuming that every function (or at least every smooth function) is analytic, and we wouldn't want to restrict to analytic curves. Still, these operations do depend only on the jets, even when the germs differ. But germs are a simpler concept.

    if I teach my calc 1 or calc 2 students to calculate with cojet differentials, aren’t they going to be confused when they get to multivariable and I tell them that now d 2=0d^2=0?

    In my Calculus classes, I've been using dη\mathrm{d} \wedge \eta for the exterior differential of η\eta. They've already seen ηζ\eta \wedge \zeta by this point, and this gives the right idea regarding skew-commutativity. (In particular, the signs in the product rule

    d(ηζ)=(dη)ζ+(1) |η|η(dζ)=(1) (1+|η|)|ζ|ζdη+(1) |η|ηdζ \mathrm{d} \wedge (\eta \wedge \zeta) = (\mathrm{d} \wedge \eta) \wedge \zeta + (-1)^{|\eta|} \eta \wedge (\mathrm{d} \wedge \zeta) = (-1)^{(1 + {|\eta|}){|\zeta|}} \zeta \wedge \mathrm{d} \wedge \eta + (-1)^{|\eta|} \eta \wedge \mathrm{d} \wedge \zeta

    come out right that way. Not that I ever write down anything like this in that class.) So ddη=0\mathrm{d} \wedge \mathrm{d} \wedge \eta = 0, but this is very different from d 2η=d(dη)\mathrm{d}^2 \eta = \mathrm{d} (\mathrm{d} \eta).

    I do tell them that people usually don't put the wedge in there (and that they sometimes don't put the wedge in the wedge product either), and this is OK because they're restricting attention to exterior differential forms.

    But even though I don't actually use higher differentials in my Calculus classes1, they do see differential forms that aren't exterior forms. There are the absolute differential forms, of course, but there's more; consider

    đs=dx 2+dy 2. &#x0111;s = \sqrt{\mathrm{d}x^2 + \mathrm{d}y^2} .

    It would be criminal not to introduce that in class! But what is dx 2\mathrm{d}x^2? (or |dx| 2{|\mathrm{d}x|}^2). It can be thought of as a symmetric bilinear form, but it's also a cojet form. (The two operations, one on a pair of curves and one on a single curve, are related by polarization.)


    1. Now that I understand them better, I might. But expressing, say, the second derivative test for extreme values in terms of differentials instead of derivatives looks so different that it may be too difficult, when it's not in the book. Anyway, the main reason for using differential in class is that people use them in applied fields, so it's not so justifiable to bring in something that you and I invented ourselves. 

    • CommentRowNumber83.
    • CommentAuthorMike Shulman
    • CommentTimeFeb 11th 2014

    these operations do depend only on the jets, even when the germs differ

    That’s true if by “these operations” you mean the ones constructed from functions by applying the cojet dd and algebra operations. In #72 you suggested generating a subring, so I guess this is what you’re thinking of. Although e dxe^{dx} wouldn’t be in that subring, nor would dx 2+dy 2\sqrt{dx^2 + dy^2}; we’d need to close up under more functions than the ring operations. The whole ring of operations-on-germs, of course, might include operations that really do depend on the whole germ rather than only the jets, although I can’t think of any examples off the top of my head.

    In my Calculus classes, I've been using dη\mathrm{d} \wedge \eta for the exterior differential of η\eta

    That’s good! I might do the same when I get to exterior derivatives. (Although I still haven’t decided whether I can justify talking about exterior differential forms at all, given that our standard textbook does everything the traditional way in terms of vectors. Is there a good multivariable calculus textbook that uses differential forms?)

    the main reason for using differential in class is that people use them in applied fields

    Hmm, that’s one good reason, but I think another good reason is that they just make the concepts easier to understand and the computations easier to do. However, it’s not clear to me that higher cojet differentials would be much use in single-variable calc for either of those purposes either. The main advantage I see right now is if I could somehow avoid talking about derivatives at all and use only differentials, but to be really effective that would require a supporting textbook.

    • CommentRowNumber84.
    • CommentAuthorMike Shulman
    • CommentTimeFeb 11th 2014

    One issue with my proposed notion of integration in #81 is that in general, it will depend on the parametrization of the curve, whereas the integral of an ordinary 1-form along a curve does not (though it does depend on its orientation). However, it does include integration with respect to ds=dx 2+dy 2ds = \sqrt{dx^2+dy^2}, which is also parametrization-invariant — I guess what matters for that is not linearity but “degree-1 homogeneity”.

    Does it also include integration of absolute 1-forms? Can an absolute 1-form be regarded as a cojet form like |dx||dx| defined by

    |ω|;c=|ω;c|?\langle {|\omega|} ; c\rangle = {\Big|\langle \omega ; c\rangle\Big|}?

    (I changed your notation ω|c\langle \omega | c \rangle to ω;c\langle \omega ; c \rangle to avoid confusion with the absolute value bars.)

    • CommentRowNumber85.
    • CommentAuthorMike Shulman
    • CommentTimeFeb 11th 2014

    Re: #80, the wedge product of two cojet 1-forms ω\omega and η\eta ought probably to be the “cojet 2-form” defined on a surface germ cc by

    ωη|c=ω|λs.c(s,0)η|λs.c(0,s)ω|λs.c(0,s)η|λs.c(s,0)\langle \omega\wedge\eta {|} c \rangle =\langle\omega {|} \lambda s.c(s,0) \rangle \cdot \langle\eta {|} \lambda s.c(0,s) \rangle - \langle\omega {|} \lambda s.c(0,s) \rangle \cdot \langle\eta {|} \lambda s.c(s,0) \rangle
    • CommentRowNumber86.
    • CommentAuthorTobyBartels
    • CommentTimeFeb 11th 2014

    I still haven’t decided whether I can justify talking about exterior differential forms at all, given that our standard textbook does everything the traditional way in terms of vectors. Is there a good multivariable calculus textbook that uses differential forms?

    I don't know of one; even Dray & Minogue don't go that far.

    My justification is that they're already integrating differential forms; the classical expression Fdr\int \mathbf{F} \cdot d\mathbf{r} is already the integral of a differential form; you just need to take it literally. All of the formulas are in my handout (where Page 6 is strictly time-permitting … which so far it hasn't been).

    • CommentRowNumber87.
    • CommentAuthorMike Shulman
    • CommentTimeFeb 12th 2014

    Suppose I start with a function and take its cojet differential over and over again.

    df(x)=f(x)dx d f(x) = f'(x) dx d 2f(x)=f(x)dx 2+f(x)d 2x d^2f(x) = f''(x) dx^2 + f'(x)d^2x d 3f(x)=f(x)dx 3+3f(x)dxd 2x+f(x)d 3x d^3f(x) = f'''(x) dx^3 + 3 f''(x) dx\cdot d^2x + f'(x) d^3x d 4f(x)=f (4)(x)dx 4+6f(x)dx 2d 2x+f(x)(3(d 2x) 2+4dxd 3x)+f(x)d 4x d^4f(x) = f^{(4)}(x) dx^4 + 6 f'''(x) dx^2 d^2x + f''(x)(3(d^2x)^2 + 4 dx \cdot d^3x) + f'(x) d^4x d 5f(x)=f (5)(x)dx 5+5f (4)(x)dx 3d 2x+f(x)(15dx(d 2x) 2+10dx 2d 3x)+f(x)(10d 2xd 3x+5dxd 4x)+f(x)d 5x d^5f(x) = f^{(5)}(x) dx^5 + 5 f^{(4)}(x) dx^3 \cdot d^2 x + f'''(x)(15dx\cdot (d^2x)^2 + 10 dx^2 \cdot d^3x) + f''(x) (10 d^2x \cdot d^3x + 5 dx \cdot d^4x) + f'(x) d^5 x

    It appears that each term in d nf(x)d^n f(x) is of the form

    af (k)(x)d i 1xd i 2xd i kx a f^{(k)}(x) d^{i_1}x \cdot d^{i_2}x \cdot \cdots \cdot d^{i_k}x

    for some knk\le n and some (unordered) partition i 1+i 2+i k=ni_1 + i_2 + \cdots i_k = n. Are the coefficients appearing here some well-known combinatorial numbers associated to partitions?

    • CommentRowNumber88.
    • CommentAuthorMike Shulman
    • CommentTimeFeb 12th 2014

    Over in the other thread, David R posted a link to an MO answer which reminded me to look back at Arnold’s book on classical mechanics, which suggests the following definition of the exterior differential of a cojet (or perhaps “cogerm” would be more appropriate) 1-form:

    dη|S=lim c01|c| 2 Scη \langle d\wedge \eta {|} S \rangle = \lim_{c\to 0} \frac{1}{{|c|}^2} \oint_{S\circ c} \eta

    where cc is a loop inside the parametrized surface SS which shrinks to nothing around (0,0)(0,0). (It might be a rectangle or parallellogram, but from the general perspective that restriction seems unaesthetic.)

    Comparing this to the definition of the differential dd from cogerm 1-forms to cogerm 1-forms, and its relationship to the exterior differential acting from 0-forms to 1-forms, suggests the following operation from cogerm 2-forms to cogerm 2-forms:

    dω|S=lim c01|c| 2 t=a bω|S c(t) \langle d \omega {|} S \rangle = \lim_{c\to 0} \frac{1}{{|c|}^2} \int_{t=a}^b \langle \omega {|} S_{c(t)} \rangle

    where cc is a loop as before, with domain [a,b][a,b], and S (u,v)(s+u,t+v)S_{(u,v)}(s+u,t+v) is a shifted version of the surface. Is this a 2-form version of the cogerm differential?

    Just throwing stuff out there at the moment, hoping sometime soon I’ll have time to think about it all carefully.

    • CommentRowNumber89.
    • CommentAuthorMike Shulman
    • CommentTimeFeb 12th 2014

    Probably “|c| 2|c|^2” should be instead the area enclosed by cc. But having thought about it a little more, I realized those limits don’t really make sense unless the integrals are invariant under reparametrization. So maybe the exterior differential doesn’t really make sense except for degree-1 1-forms? And is there any sort of commutative differential on 2-forms? Would we hope or expect it to behave in any particular way? It feels weird to me that we have the world of cogerm 1-forms with the commutative dd, and the world of exterior forms with the exterior dd\wedge, which agree in the world of linear degree-1 1-forms and the differential of functions, but are thereafter completely unrelated.

    • CommentRowNumber90.
    • CommentAuthorTobyBartels
    • CommentTimeFeb 20th 2014
    • (edited Apr 2nd 2015)

    Can an absolute 1-form be regarded as a cojet form like |dx||d x| defined by

    |ω|;c=|ω;c|?\langle {|\omega|} ; c\rangle = {\Big|\langle \omega ; c\rangle\Big|}?

    I would certainly accept this definition of |ω|{|\omega|} in line with the previous discussion of f(ω)f(\omega) (where ω\omega is a cojet form, or more generally a finite list of such, and ff is a differentiable function); there's no reason that ff has to be differentiable (we just can't conclude that f(ω)f(\omega) is differentiable).

    So I guess that your question is: if ω\omega is an exterior 11-form, then is this |ω|{|\omega|} the absolute 11-form called |ω|{|\omega|} on the absolute differential form page? And the answer is Yes; at least, it certainly does the right thing to a curve.

    But not every absolute 11-form arises in this way! Besides multiplying by an arbitrary 00-form (so that an absolute 11-form need not be positive semidefinite), even some positive definite forms, such as dx 2+dy 2\sqrt{\mathrm{d}x^2 + \mathrm{d}y^2}, don't arise in this way.

    Nevertheless, any absolute 11-form does have an action on curves (via their tangent vectors, if you follow the definition at absolute differential form), and this is homogeneous of degree 11, so your integration formula does integrate them.

    • CommentRowNumber91.
    • CommentAuthorTobyBartels
    • CommentTimeFeb 20th 2014

    It feels weird to me that we have the world of cogerm 1-forms with the commutative dd, and the world of exterior forms with the exterior dd\wedge, which agree in the world of linear degree-1 1-forms and the differential of functions, but are thereafter completely unrelated.

    There is some more overlap if you look at symmetric bilinear forms (rather than only the antisymmetric ones that are exterior 22-forms). Some cojet (or cogerm) forms are linear, and these agree with the exterior 11-forms; but some cojet forms are quadratic, and these agree with the symmetric bilinear forms. Of course, these are viewed as functions of different things, but they are equivalent by the polarization identities. An arbitrary bilinear forms is then given by a quadratic cojet form together with an exterior 22-form.

    This doesn't go so easily into higher rank.

    • CommentRowNumber92.
    • CommentAuthorMike Shulman
    • CommentTimeFeb 24th 2014

    I thought it was about time to record some of this discussion, so I created cogerm differential form.

    • CommentRowNumber93.
    • CommentAuthorTobyBartels
    • CommentTimeFeb 24th 2014

    Looks good! I discussed it in a thread dedicated to it. (Mike already noticed this, but I record it for the sake of future generations.)

    • CommentRowNumber94.
    • CommentAuthorMike Shulman
    • CommentTimeMay 18th 2015

    Re: #87, the sum of the coefficients of terms in d nf\mathrm{d}^n f involving f (k)(x)f^{(k)}(x) is the Stirling number of the second kind S(n,k)S(n,k): the number of ways to partition an nn-element set into kk nonempty subsets. The coefficients themselves are simply the further classification of these partitions according to the multiset of cardinalities of the kk nonempty subsets (which feels like it ought to have something to do with Young tableaux). This is more obvious if we use the coflare differentials where d 1d 0d 0d 1d_1 d_0 \neq d_0 d_1: then none of the terms can be combined, and each term like d 2d 0xd 3d 1x\mathrm{d}_{2}\mathrm{d}_0 x \, \mathrm{d}_3 \mathrm{d}_1 x evidently represents a particular partition of an nn-element set into kk nonempty subsets.

    • CommentRowNumber95.
    • CommentAuthorTobyBartels
    • CommentTimeMay 19th 2015
    • (edited May 19th 2015)

    In coflare differentials, I don't think that d 0d 1x\mathrm{d}_0\mathrm{d}_1x makes sense at all; in any case, it doesn't show up in d nf(x)\mathrm{d}^{n}f(x). That's just as well, since the Stirling number doesn't count {{0,1}}\{\{0,1\}\} and {{1,0}}\{\{1,0\}\} as distinct partitions of 22 into 11 nonempty subset.

    • CommentRowNumber96.
    • CommentAuthorMike Shulman
    • CommentTimeMay 20th 2015

    Yes, that’s true; I think I meant to say something like d 1d 0d 2d 0d_1d_0 \neq d_2d_0.

    • CommentRowNumber97.
    • CommentAuthorTobyBartels
    • CommentTimeMay 25th 2015

    Or simply that d 0d 1\mathrm{d}_0 \neq \mathrm{d}_1. Either will do, since the first nontrivial coefficient comes from combining d 2d 1xd 0x\mathrm{d}_2\mathrm{d}_1x \,\mathrm{d}_0x, d 1xd 2d 0x\mathrm{d}_1x \,\mathrm{d}_2\mathrm{d}_0x, and d 2xd 1d 0x\mathrm{d}_2x \,\mathrm{d}_1\mathrm{d}_0x, where already for each pair there are two differences between them.

    • CommentRowNumber98.
    • CommentAuthorTobyBartels
    • CommentTimeJul 19th 2015

    On the subject of partial derivatives, John Denker makes the interesting point that

    (ux) y,z=dudydzdxdydz \Big(\frac{\partial{u}}{\partial{x}}\Big)_{y,z} = \frac{\mathrm{d}u \wedge \mathrm{d}y \wedge \mathrm{d}z}{\mathrm{d}x \wedge \mathrm{d}y \wedge \mathrm{d}z}

    at http://www.av8n.com/physics/partial-derivative.htm#sec-wedge-ratio. This is easy enough to verify by calculation, but also check out the pictorial explanation.

    • CommentRowNumber99.
    • CommentAuthorTobyBartels
    • CommentTimeNov 4th 2020

    Trying to make the previous comment work with second derivatives:

    Suppose that u u is a function of x x . Then

    du=uxdx, \mathrm { d } u = \frac { \partial u } { \partial x } \, \mathrm { d } x ,

    so

    ux=dudx. \frac { \partial u } { \partial x } = \frac { \mathrm { d } u } { \mathrm { d } x } .

    Thus,

    2ux 2=(ux)x=d(dudx)dx, \frac { \partial ^ 2 u } { \partial x ^ 2 } = \frac { \partial \left ( \frac { \partial u } { \partial x } \right ) } { \partial x } = \frac { \mathrm { d } \left ( \frac { \mathrm { d } u } { \mathrm { d } x } \right ) } { \mathrm { d } x } ,

    which expands to

    2ux 2=dxd 2udud 2xdx 3. \frac { \partial ^ 2 u } { \partial x ^ 2 } = \frac { \mathrm { d } x \, \mathrm { d } ^ 2 u - \mathrm { d } u \, \mathrm { d } ^ 2 x } { \mathrm { d } x ^ 3 } .

    On the other hand,

    d 2u= 2ux 2dx+uxd 2x, \mathrm { d } ^ 2 u = \frac { \partial ^ 2 u } { \partial x ^ 2 } \, \mathrm { d } x + \frac { \partial u } { \partial x } \, \mathrm { d } ^ 2 x ,

    so

    d 2ud 2x= 2ux 2dxd 2x, \mathrm { d } ^ 2 u \wedge \mathrm { d } ^ 2 x = \frac { \partial ^ 2 u } { \partial x ^ 2 } \, \mathrm { d } x \wedge \mathrm { d } ^ 2 x ,

    so

    2ux 2=d 2ud 2xdxd 2x. \frac { \partial ^ 2 u } { \partial x ^ 2 } = \frac { \mathrm { d } ^ 2 u \wedge \mathrm { d } ^ 2 x } { \mathrm { d } x \wedge \mathrm { d } ^ 2 x } .

    Now suppose that u u is a function of x x and y y . Then

    du=uxdx+uydy, \mathrm { d } u = \frac { \partial u } { \partial x } \, \mathrm { d } x + \frac { \partial u } { \partial y } \, \mathrm { d } y ,

    so

    dudy=uxdxdy, \mathrm { d } u \wedge \mathrm { d } y = \frac { \partial u } { \partial x } \, \mathrm { d } x \wedge \mathrm { d } y ,

    so

    ux=dudydxdy. \frac { \partial u } { \partial x } = \frac { \mathrm { d } u \wedge \mathrm { d } y } { \mathrm { d } x \wedge \mathrm { d } y } .

    Thus,

    2ux 2=(ux)x=d(dudydxdy)dydxdy, \frac { \partial ^ 2 u } { \partial x ^ 2 } = \frac { \partial \left ( \frac { \partial u } { \partial x } \right ) } { \partial x } = \frac { \mathrm { d } \left ( \frac { \mathrm { d } u \wedge \mathrm { d } y } { \mathrm { d } x \wedge \mathrm { d } y } \right ) \wedge \mathrm { d } y } { \mathrm { d } x \wedge \mathrm { d } y } ,

    which unfortunately can't be expanded without abandoning the \wedge notation.

    On the other hand,

    d 2u= 2ux 2dx 2+2 2uxydxdy+ 2uy 2dy 2+uxd 2x+uyd 2y, \mathrm { d } ^ 2 u = \frac { \partial ^ 2 u } { \partial x ^ 2 } \, \mathrm { d } x ^ 2 + 2 \frac { \partial ^ 2 u } { \partial x \partial y } \, \mathrm { d } x \, \mathrm { d } y + \frac { \partial ^ 2 u } { \partial y ^ 2 } \, \mathrm { d } y ^ 2 + \frac { \partial u } { \partial x } \, \mathrm { d } ^ 2 x + \frac { \partial u } { \partial y } \, \mathrm { d } ^ 2 y ,

    so

    d 2udxdydy 2d 2xd 2y= 2ux 2dx 2dxdydy 2d 2xd 2y, \mathrm { d } ^ 2 u \wedge \mathrm { d } x \mathrm { d } y \wedge \mathrm { d } y ^ 2 \wedge \mathrm { d } ^ 2 x \wedge \mathrm { d } ^ 2 y = \frac { \partial ^ 2 u } { \partial x ^ 2 } \, \mathrm { d } x ^ 2 \wedge \mathrm { d } x \mathrm { d } y \wedge \mathrm { d } y ^ 2 \wedge \mathrm { d } ^ 2 x \wedge \mathrm { d } ^ 2 y ,

    so

    2ux 2=d 2udxdydy 2d 2xd 2ydx 2dxdydy 2d 2xd 2y. \frac { \partial ^ 2 u } { \partial x ^ 2 } = \frac { \mathrm { d } ^ 2 u \wedge \mathrm { d } x \mathrm { d } y \wedge \mathrm { d } y ^ 2 \wedge \mathrm { d } ^ 2 x \wedge \mathrm { d } ^ 2 y } { \mathrm { d } x ^ 2 \wedge \mathrm { d } x \mathrm { d } y \wedge \mathrm { d } y ^ 2 \wedge \mathrm { d } ^ 2 x \wedge \mathrm { d } ^ 2 y } .
    • CommentRowNumber100.
    • CommentAuthorTobyBartels
    • CommentTimeDec 17th 2020
    • (edited Dec 17th 2020)

    Re #87:

    The coefficients appearing here are those that appear in Bell polynomials, and they are well known (although not by me, until yesterday) both to come from counting partitions and to give a formula for the higher derivatives of a composite function, Faà di Bruno's formula. This formula gives the higher cojet differentials of f(x)f(x), where ff is a real-valued function of a real variable, differentiable at least nn times, and xx is a real-valued quantity (technically a real-valued function on some manifold), also differentiable at least nn times:

    d n(f(x))= πf (|π|)(x) Bπd |B|x, \mathrm{d}^n\big(f(x)\big) = \sum_\pi f^{({|\pi|})}(x) \prod_{B\in{\pi}} \mathrm{d}^{|B|}x ,

    where the sum is taken over the set of all partitions of {1,,n}\{1,\ldots,n\}, each partition π\pi being thought of as a subset of the powerset of {1,,n}\{1,\ldots,n\} (so that both π\pi and any BπB \in \pi have a cardinality given by ||{|{\cdot}|}).

    A partly multivariable version of the formula may be adapted to coflare forms. First some notation: if A={i 1,i 2,,i n}A = \{i_1,i_2,\ldots,i_n\} is a finite multisubset of \mathbb{N}, then write d Au\mathrm{d}^A{u} for d i 1d i 2d i nu\mathrm{d}_{i_1}\mathrm{d}_{i_2}{\cdots}\mathrm{d}_{i_n}u (which is unambiguously defined if uu is at least nn times differentiable). Also, if B{1,,n}B \subseteq \{1,\ldots,n\} (a set, not any multiset), then let i Bi_B be {i j|jB}\{i_j \;|\; j \in B\} (a multiset). With this notation,

    d A(f(x))= πf (|π|)(x) Bπd i Bx, \mathrm{d}^A\big(f(x)\big) = \sum_\pi f^{({|\pi|})}(x) \prod_{B\in{\pi}} \mathrm{d}^{i_B}x ,

    a partial decategorification of the cojet version.

    A fully multivariable version of the formula would also allow ff to be a function of mm variables, with d(f(x 1,,x m))=f(x 1,,x m)dx 1,,dx m= j=1 mD jf(x 1,,x m)dx j\mathrm{d}\big(f(x_1,\ldots,x_m)\big) = \nabla{f}(x_1,\ldots,x_m) \cdot \langle{\mathrm{d}x_1,\ldots,\mathrm{d}x_m}\rangle = \sum_{j=1}^m \mathrm{D}_j{f}(x_1,\ldots,x_m) \mathrm{d}x_j as the order-11 case, but I haven't tried to think that through yet.

    ETA: You can take AA and i Bi_B to be tuples rather than multisets, if you prefer. But the order doesn't matter, just as with partial derivatives.