Want to take part in these discussions? Sign in if you have an account, or apply for one below
Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.
1 to 24 of 24
A propos of the problems we continue to see with very large pages, as described most recently today here, and as promised in that thread, I have tried to make some kind of example to illustrate how the nLab would benefit from having a ’frontend’ that is more or less static (just serves HTML/CSS/Javascript, without any server-side programming or database layer).
You can see an example at https://ncatlab.org/experimental-frontend/18645.html. The key point is that this page loads more or less immediately (MathJax takes a while to run on it afterwards, but ignore that for the moment). On the nLab currently, this page takes a long time to appear, and this may be behind the problems we saw earlier today (and in any case, is clearly highly undesirable).
So what does the example do? Well, it is just a tiny (codewise) server (20 lines, to be precise), written in Go, which serves html pages to be found on the nLab server (exactly those which are copied to the nlab-content-html git repository), using the same CSS as the nLab. These html pages represent the ’fully generated’ nLab page, i.e. the Markdown has been parsed, links have been created, etc.
The purpose of the example is to illustrate a few points:
1) There is something terribly slow in Instiki’s page rendering. I have not isolated exactly where it is slow, but there is quite a bit of stuff going on before the page actually appears. There should be no need for any programming at all, it should be perfectly possible for the content to be static once a page has been made/edited.
2) There is nothing really significantly wrong with the Markdown generation aspect of the nLab.
3) We are not that far away from being able to change things for the better. It took me about 10 minutes to write the server (I have done it before, but still…), which is of ’production quality’, which is jargon for: it can easily handle the full load of nLab traffic, indeed many times more. I have done absolutely nothing to the HTML files, or indeed anything at all, except write the server and tweak the nginx config.
And now a few notes.
1) To show a page, you need to use https://ncatlab.org/experimental-frontend/m.html, replacing m by the number of some page. If there’s a particular page that you would like to see, you can ask me to try to find the number for you, or you can search in the git repository I mentioned.
2) No links or anything else will work.
3) Some of the Javascript does not work (numbering of theorems, etc). I deliberately wished to touch absolutely nothing, so that we know exactly where we are.
Where to go from here? Well, I’d be interested in your feedback. It would probably be quite easy to make this actually usable, i.e. getting correct links and making the Javascript work. Once we’re happy with it, we could actually switch nlab/show (which all pages are prefixed with in the URL when one loads them, as opposed to editing them or looking at revision history, etc) to point to the static site, if people wished (and if the nlab-content-html directory is updated immediately when a page is edited; I’m not sure whether or not that is the case yet, but we could enforce it if it isn’t). The remainder of the site could be as before, and we could then gradually swap out that bit too.
I’d also like to experiment speeding up the MathJax. I think the only way is to render it server side. This appears to be possible; I have not done it before, but I will give it a go as soon as I get the chance.
If people like these ideas, there could be lots of small tasks if people would like to contribute. Everything will be able to be built locally, so one wouldn’t need access to the nLab server, one could just make and test the change locally and push it to github, and then I/Adeel can review it and merge it.
Thanks David, yes, MathJax node is what I plan to try. I’ve tried it a bit on my own machine, it looks promising. Basically what would happen is that when one creates/edits a page, we run it through MathJax node to create the HTML or SVG, and then we just load the result statically, like is done in the ’experimental frontend’.
Awesome! A static frontend is something I’ve thought about for a long time, but never actually got around to implementing. Needless to say I definitely support the idea. One question: why not just use Nginx’s built-in static server, surely that must be even faster than additionally going through your Go server?
Interesting! Thanks for trying this out. For me, https://ncatlab.org/experimental-frontend/18645.html takes about 11.5 seconds to load, compared to about 15 seconds for geometry of physics – A first idea of quantum field theory – a significant difference, certainly, but not what I would call “more or less immediately”. How long does it take for you?
That was in Firefox BTW. In Chrome, the times are 6 and 9 seconds respectively, faster but about the same ratio. (But then, as you say, Chrome takes forever to render the MathJax, whereas Firefox renders the MathML instantly. Please don’t do anything to change that!)
Re #4: Thanks very much for your thoughts, Adeel! Really great that you support the idea! It goes without saying that I will of course under no circumstances make any substantial change like this live, i.e. non-experimental, without consulting you :-).
One question: why not just use Nginx’s built-in static server, surely that must be even faster than additionally going through your Go server?
Good question! Quite simply, it had not occurred to me! One thought I did have was that I might in the end need the additional flexibility that the Go server would give, but I certainly agree that nginx would be even faster. Let’s give it a try!
Re #5 and #6: Thanks very much, Mike, this is extremely useful feedback. I see similar numbers today. Yesterday, what happened was that it took over a minute before anything appeared on the screen when trying to load geometry of physics – A first idea of quantum field theory, whereas the same behaviour as today occurred for the experimental server, maybe even slightly faster: one got a more or less loaded page within a few seconds. Maybe more or less immediately was stretching the point a bit! But the experience is at least reasonable for the user.
The difference between today and yesterday is that yesterday there was no cache of the page, and today there was. Actually, it seems that Firefox (or doing a GET request via curl, which is probably the best way to do it, to avoid any browser overhead) loads the page quickly enough to trigger a cache, whereas this does not occur in Chromium (or the webkit browser I am using, ’qutebrowser’, whose behaviour is typically similar to Chromium). If I remove the cache and try in qutebrowser/Chromium, I get the same behaviour as yesterday; I do not typically use Firefox, so did not generate the cache myself yesterday.
So this shows that the following was in fact wrong.
1) There is something terribly slow in Instiki’s page rendering. I have not isolated exactly where it is slow, but there is quite a bit of stuff going on before the page actually appears. There should be no need for any programming at all, it should be perfectly possible for the content to be static once a page has been made/edited.
2) There is nothing really significantly wrong with the Markdown generation aspect of the nLab
We see now, as Adeel has suspected before, that 2) is wrong. There is nothing significantly wrong with the HTML that is produced, but there is something significantly wrong with the production of it. There is still truth in 1), in that Instiki’s page rendering is a bit slow and the programming layer is unnecessary, but ’terribly slow’ seems inaccurate: ’mildly slow’ would be better.
We should note though that it is not only performance reasons that might lead us to have a static frontend, it is also infrastructurally a bit more robust and modular (in particular, easier to test and isolate where problems actually are). But performance is the most obvious thing we need to improve.
(But then, as you say, Chrome takes forever to render the MathJax, whereas Firefox renders the MathML instantly. Please don’t do anything to change that!)
Absolutely. The reason I would like to do gradual ’experiments’ is exactly to gather feedback so that we can make a good decision. Please keep the feedback coming! We’ll have to see how Firefox handles the HTML/SVG produced by MathJax node. Hopefully the performance will be as good as before. If not, we’ll have to make an exception for Firefox.
I’ve now added nginx’s built in static server, as suggested by Adeel in #4, as an example, at https://ncatlab.org/nginx-experimental-frontend/18645.html. It does seem very snappy for me at the moment, indeed it really does seem more or less immediate in qutebrowser/Chromium (not completely loaded, but something appears on the screen which appears to be more or less completely loaded pretty much immediately; complete loading (minus MathJax) comes a couple of seconds later).
One is never going to beat nginx for speed, but I think the Go server would be only slightly slower (indeed, it is lightning fast on my own machine) if it were requested directly; but since the request comes to nginx first and then is relayed to the Go server, we definitely lose time.
Would be great to get feedback from Mike and others on the speed of the nginx static server compared to the others.
More later, but I’m a bit confused, can you say exactly what you mean by “trigger a cache”? We’re talking about server-side caches, right? Can you explain why the speed of the request makes a difference? I don’t know much about the internals of how web servers and HTTP works.
My apologies for not being clear.
can you say exactly what you mean by “trigger a cache”?
At the moment, I have not gone deeply enough into this aspect of Instiki’s codebase to know exactly what happens, so I am going off observations of what happens on the server. Here are the things I know.
1) There is a ’cache’ folder on the nLab server where certain ’.cache’ files are stored. These are very similar to those in nlab-content-html. If present, these .cache files are basically served by Instiki when one loads a page (i.e. issues a GET request), though there is some programming going on, it is not just a straightforward ’shove the cache in the response body’.
2) If I wipe the cache (i.e. remove all files), which I sometimes need to do when making a change in Instiki, the .cache files are soon re-generated by Instiki. This roughly involves the markdown and tex that is stored in the database being parsed, and a HTML page being generated. This appears to be according to use, i.e. the first time a page is requested, and presumably after it has been edited, the .cache file seems to be generated. Since we have sites crawling the entire nLab, in practise it seems the entire cache is soon regenerated. This is what I meant by ’trigger a cache’, i.e. the first time a page is requested, it seems to lead to generation of a ’.cache’ file.
3) This generation of the .cache file does not happen when I load geometry of physics – A first idea of quantum field theory in qutebrowser or Chromium if it were not there before. I presume it is because the load itself, which will involve the parsing of Markdown and tex source to HTML, takes too long. But I have not looked deeply enough into it to know the precise details. For some reason, presumably because the load is a bit faster in the case that there is no .cache file, the .cache file is generated if I load the page in Firefox, and also if I curl the page (i.e. do a GET request from the command line, outside of a browser).
Let me know if things are still unclear on this point.
We’re talking about server-side caches, right?
Correct.
Can you explain why the speed of the request makes a difference? I don’t know much about the internals of how web servers and HTTP works.
Sorry for not being clearer. If you were referring to why using nginx directly is faster than nginx + Go, it is just because the ’transport’ layer (i.e. literally sending the bytes from one server to another, as TCP packets) is relatively slow, so any additional transports will affect response time. The key word, here, though, is ’relatively’: usually the difference is trivial compared to the time it takes for other things to take place, and it is entirely usual for many servers to be involved behind a website. In the case of this page, though, there are several MB being shipped around, because of the picture, so I expect it makes a small difference (though I have not measured it) in addition to nginx itself being slightly quicker than the Go server.
(But then, as you say, Chrome takes forever to render the MathJax, whereas Firefox renders the MathML instantly. Please don’t do anything to change that!)
We’ll have to see how Firefox handles the HTML/SVG produced by MathJax node. Hopefully the performance will be as good as before. If not, we’ll have to make an exception for Firefox.
Please never drop MathMl output. The other formats are opaque if you try to inspect and debug them. There are ongoing efforts (mainly by Frédéric Wang) to support MathMl in Chrome and they seem to have it working in WebKit (though I don’t know if it has made it to an official Safari release.)
Please never drop MathMl output.
Thanks for the feedback. Of course we will not do anything without consensus.
Personally, I am not convinced by the advantages of MathML over the following.
The other formats are opaque if you try to inspect and debug them
My focus for now is mainly a pragmatic one, of obtaining a performant and robust nLab. In this vein, my present thoughts are that I would happily sacrifice MathML if the performance is as good and it means that we avoid the added complexity of having to do something browser-dependent.
I do understand that MathML has significant semantic advantages, e.g. for search engines, so I would happily adopt it if all conditions are met. But if we use it, I would suggest that we need to have good tools to produce it, and we need to have good performance across all browsers. Currently we have neither of these (Adeel and I agree that itex2MML is a major piece of ’technical debt’, as the jargon goes).
Of course, MathJax can produce MathML, so it is not a purely either-or situation, it is more a question of simplicity and consistency.
Just a note for the curious that I have begun working on a simple parser which will pass tex into mathjax-node-cli, handle Definition/Theorem/etc environments, and otherwise use a fast existing parser for converting Markdown to HTML. The use of mathjax-node-cli is working fine, and the results look good and load instantaneously in the browser; it is now just a question of going through the markdown source to extract tex snippets and feed them in. It should be possible to do this and handle Definition/Theorem environments and numbering in one pass through the file, so I expect that the parser should be speedy. Will update when I have an example ready. In the meantime, please continue with any thoughts about the examples so far.
Thanks for the clarification. I still don’t quite understand why the speed of request would influence whether the cache is filled, but I’ll take your word for it.
Can mathjax-node produce mathml server-side?
One complication with caching/static files, which I’m sure you’re aware of, is that with the current setup, editing one page (e.g. by adding or removing redirects) can cause other pages to need to be regenerated. But that could be eliminated by making all redirects happen at request time rather than actually having the link in the generated html point to the redirected page. Of course in any case the static webserver needs to be “aware” of the redirects somehow.
I still don’t quite understand why the speed of request would influence whether the cache is filled,
It seems to me Richard said that there may be an independent timeout for chaching pages: The caching agent requests a page in order to save it in cache, but the reply takes forever and so the caching agent gives up.
Re #15: exactly, thanks! I cannot confirm this without going more deeply into the codebase, but what you describe is indeed exactly what I suspect.
Can mathjax-node produce mathml server-side?
Yes. What I envision is that we make the design as flexible as possible. Even if we do not use MathML on the main nLab, it should be possible to simply pass a parameter when calling the backend API for converting source into HTML so that it outputs MathML instead of SVG. This will allow tools which really need MathML (Bas has brought to my attention one such project for example) to still make use of the data in the nLab. If we ever wanted to change to MathML on the main lab, it would again just be the case that we add one parameter when calling the backend from the frontend. This is the kind of thing I mean by a ’robust’ and more modular nLab.
editing one page (e.g. by adding or removing redirects) can cause other pages to need to be regenerated.
Did you perhaps mean ’includes’ rather than redirects? As you say, we would indeed need to handle this, and it is important we remember to do so, thanks! I do have it in mind, and have at least one solution in mind; let’s discuss that once we’ve got the basics working (no time today to do much).
I really did mean redirects, but of course includes also have that issue.
I also really want the main nLab to remain MathML for browsers that support it.
I second Mike’s call for MathML. to remain in place for those that use the browser(s) that support it.
I really did mean redirects
Thanks for clarifying! I see what you mean now; yes, as you say, the same kind of thing needs to be done as for includes.
Regarding MathML, thank you both for your thoughts. Again, we will of course not implement anything without consensus, so it looks like we will do what you suggest. Let us wait until we have examples of the different kinds to look at, though, before making a final decision.
Also, it would be great if you could both elaborate a bit as to your reasons, again, to help us all make a good, informed decision. For browser detection comes with its own downsides, and we need to balance those downsides against the things in favour of MathML.
Certainly, I am potentially open to being convinced to drop MathML; I just want to be sure that we don’t drop it without at least an approximate consensus (so thanks for the reassurance). Some of the reasons I want to keep it include:
Thanks very much, Mike, this is very helpful! As you suggest, let’s see, when I have some examples ready, how the first four of your points look for MathML compared to the other options. If MathML is clearly superior, there is no argument. If the options are more or less comparable on these four points, or maybe even other options are better in certain cases, then we can weigh the last two of your points against the downsides.
@Michael
thanks for the ETA for MathML in Chromium. Nice to know it’s not an “any day/month/year now….” project.
1 to 24 of 24