Want to take part in these discussions? Sign in if you have an account, or apply for one below
Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.
The Lab is having some problem. It goes slow to the extent of not responding. A restart makes it come back for a minute, but then it goes away doing something else again. Also, every second time that I restart it (out of four now, within maybe two hours) the restart command returns with an error-message:
promptn1:65 bad math expression: operand expected at '> 1'
Actually, it does not come back anymore at all at the moment. Restart won’t help. Maybe restarting actually makes things worse? If it prevents the Lab from finishing whatever it is that it is determined to do?
Whoa! Major Zombie attack.
(A zombie process is one that’s finished but refuses to die. It’s not really an attack in that it’s not directly to do with outside things - though there can be factors that trigger it - but to do with processes on the server not being dealt with properly.)
I think that a restart doesn’t clear the zombies so if something is causing them, restarting doesn’t help much. Fortunately they can be killed, but they have to be killed explicitly. To find the zombies, do ps auxc
from the command line and look for things that start with nlab
, end with ruby
, and have a D
in the middle (specifically, the 8th column). These are zombies and can be killed. To kill them, write kill <PID>
(or kill -9 <PID>
if you want to use heavy artillery) where <PID>
is the number in the 2nd column.
okay, thanks for the info!
I will look into this, by the way. I’m just not all that sure where to start!
But you have a systematic way of recognizing Zombies? Can a script recognizem them too, then?
Dear Andrew,
I mean, is there a way to give a command to “automatically kill all processes with a D in that 8th column”? I am now busy killing Zombies one by one. It’s a bit tedious.
Yours, van Helsing
I see that there are processes belonging to “azimuth”. Why is that? Is that blog running on our sever?
I now go and kill Zombies whenever the Lab slows down. One can watch it come back to life as the Zombies dimish. Say I have a handful of nLab pages open, all of them stalled, I can see with the first Zombie killed, one of them suddenly gets data again, with the second one killed the next one starts to appear, and so on.
But they keep coming back, of course. It feels like injecting blood into Lucy’s body while the bat keeps banging against the window, waiting for its chance.
However, Zombies alone don’t seem to be the problem. Right now the Lab is sluggish again, and I check and see no Dn-process at work…
Yeah, something else is wrong, too. I keep fighting the Zombies and they are gone for the moment. But the lab doesn’t respond anyway…
I just asked it to search for the phrase "compact site", which maybe was a bit vague. I hope I haven't broken it?
Maybe nobody knows what keeps “breaking the nLab”. I keep killing Zombies, keep gently restarting it when all Zombies are gone and it still does not wake up. But all in all, it is sluggish and down all the time. Something is keeping it busy. But I don’t know what. The Zombie processes are clearly part of it, but not the full answer. (And I am getting kind of sick of killing them.)
On the other hand: yes, it seems like all the instiki functionality beyond just displaying pages, i.e. searches, displaying recently revised and all this which requires looking through the database, is still a cause of problems. Best not to use it.
If you want to search the nLab, use Google for it. Maybe we should switch off the search-functionality here, too (it’s not useful anyway, is It? I never use it), and switch off “recently revised”, also.
I actually use the search-function frequently, but I don’t have to, if it would help matters.
Oh, I didn’t know this. I strongly suggest that we all refrain from using any of the links at the top of a page. We know they cause trouble.
Also, the Google-searches for the nLab give much better results. Just add
site:ncatlab.org
to the end of your search query, or just use the search box at HomePage.
We know they cause trouble.
Count me then, henceforth, as belonging to that knowing ’we’.
the Google-searches for the nLab give much better results
That depends entirely on what one is searching for. Google searches for text as it appears on the page, while the internal search engine searches for source text. Both are useful.
I know that I use the Lab much less than Urs, but it’s hardly ever slow for me.
Could clicking on “See Changes” be causing a search? I can imagine some implementation of this that does.
Currently what happens when you are looking at a page (say version #10), then somebody edits the page (making #11), and then you click on “See Changes” without first refreshing? Do you see the #10-9 changes or the #11-10 changes?
I have just killed four Zombies, whose summed CPU use according to ps was over “90%”, whatever that means (the total percentage of all threads is a good bit over 100).
After that one of my stalled pages appeared. But another one still doesn’t. I can see three processes owned by “nlab” and with “STAT R1” use about “80%” currently.
May I ask again: somebody here will know this: what’s the command to kill all processes with STAT Dsomething?
Just a few more pieces of data, in case it helps any diagnosis:
after killing those Zombies, things didn’t improve noticably. One page came to me, but then nothing else.
So I decided I could just as well continue killing also non-Zombie processes to see if I catch one that is causing the trouble (What does the “D” STAT refer to anyway? Don’t feel obliged to reply.)
So I killed one by one the “nlab”-owned process that had CPU usage above “40%” and one that had a noticably large “TIME”-reading.
None of this had any noticable effect on my end. Though I guess I deprived some other users (or bots) probably of their nLab service.
Anyway, after waiting a bit more, now it’s reacting again more normally.
No response now.
Up for me now.
Down for me right now. I logged into the server and saw several nlab/ruby processes using lots of memory, but I wasn’t able to restart it because I don’t think I have the labelf password for sudo (only my ssh key on the server).
It’s down for me too.
The lab is down for me at the moment.
Still down it seems.
Just went on a zombie-splatting spree.
Just went on a zombie-splatting spree.
Did it help?
Maybe I am not doing it right then. I gave up again the “zombie-hunting” that you had suggested. Have restarted instiki three times today. I tried the “zombie”-thing again, just to be sure. But no real effect.
Maybe there is kind of an effect the second after killing the zombie. But the thing is that they all come back.
That is, if we are both talking about killing processes with STAT = D1.
What gives birth to zombies, anyway? I want to make sure it’s not some action I’m performing that may be responsible.
The zombies have taken over again! The lab seems to be down.
I have restarted once more. It was still immensely slow when it came back. Now it seems to be getting better.
To some extent restarting helps sometimes, but if there is that something happening that makes the Lab real slow, then it keeps doing so also after the restart… until it disappears by itself.
I’ve spent a day watching the zombies.
Technically, they aren’t zombies. They are processes in “uninterruptible sleep” which is different to zombies. These are processes that need to do something like access the filesystem and which can’t temporarily. Under normal circumstances, things like accessing the filesystem are meant to happen almost instantaneously so the fact that these have to wait is an indication of something wrong lower down.
The processes do go in and out of this sleep quite a lot, a lot more than I would expect. Then it would seem that something tips and the whole lot slows down.
Anyway, watching how often I see these sleeps makes me think that it’s a problem with the hardware that our VPS is running on so I will get in touch with the provider and see if they have anything to say on the matter.
I have found the lab in unresponsive mode this morning. Restarting made it recover for a few seconds, then it passed out again. I have been doing a little bit of editing this morning, but I had to restart essentially every time before loading or saving a page.
But now I will have to go offline and will be absorbed much of the day. So I won’t be able to restart much. But I’ll try to come online now and then and see what I can do. Or maybe Andrew meanwhile has more to say. Or maybe the Lab recovers by itself.
I eventually got the Lab to come up, but then it has been really slow.
Our esteemed hosts said that there was a memory issue and they’d fixed that. But it doesn’t seem to have had an effect on the zombies (actually, given that they are going into “uninterruptible sleep”, perhaps “zombie” is the wrong term. What was Sleeping Beauty’s real name?). So I’ve asked them to look a bit deeper into it.
In the meantime, killall -9 ruby
is a quick way to get rid of the comatose processes.
Trying to do a ’recently revised’ list I got the message:
Application error
Rails application failed to start properly
Thanks.
I tried
killall -9 ruby
a few times, but it didn’t bring the Lab back. On the other hand, it also throws “Operation not permitted” messages for a few (~four) ruby processes.
It’s working for me at the moment.
The Operation not permitted
is expected. Doing killall -9 ruby
tries to kill every process that matches ruby
. But the nlab user is only allowed to kill those that it started. So the ones that match ruby
but that nlab doesn’t own aren’t killed and that error message is the one signalling that.
Yes, thanks, now it is back.
I’ve just added a little extra to the logging system which tells me the PID of the process in the log file. So if you happen to notice a comatose process with really excessive numbers (for example, if the VIRT column is of the order of 500m or the TIME does not start 0:) then if you mention it here I might be able to see if there are particular requests that link to these processes.
Okay, thanks!!
Keeping an eye on the logs I noticed some interesting behaviour. I did a count of requests from particular IPs. This inspired me to the following hilarious joke:
Q: What’s the difference between Urs and Googlebot?
A: Googlebot doesn’t look at the nLab as much.
OOPS Down again! Yoyo: back again
I just did an experiment with loading All Pages
. It looks like I might have to disable that. All of the comatose process that I tested today were requests for list
which is All Pages
or one of the other such lists.
Is the Lab down? It is taking a long time to load, and Azimuth seems to be down as well.
I have restarted the server. That brought it back. (At least for the moment…)
Thanks.
I came online and found the lab down a minute back. Have restarted the server, now it seems to work again.
Connection times out on my phone; I can’t read the Lab.
I had the same problem here; now restarted.
Have restarted it again. Had to restart a few times in the last days (but not in many weeks before that).
It seems to be down again.
Have restarted it again.
Down again.
Still down it seems.
Have restarted it.
(Hm, the moment the cache bug is removed, the no-response-bug seems to come back. Is this a coincidence?)
restarted
restarted
found the nLab unresponsive a few times today and restarted the server a couple of times. (Maybe problems are caused by increased traffic today?)
Right now it has a really bad moment. I can’t seem to bring it back for more than a minute.
It’s again misbehaving: down, and I can’t bring it back.
It’s sloowly coming back…
It’s sort of not down, but also sort of not up. It’s like walking through deep mud.
Is it down again?
It seems that this is a case of where the whole server is busy otherwise, for when I log into it, it responds sluggishly. But now it seems to be back to business. For the time being.
had to restart again, but now restarting did have an effect
(hm, so it went down pretty much at the same time as yesterday…)
It’s unresponsive again. Restarting doesn’t help. Again ~24 hours from when this happened last time.
hm, strange, it’s back, but it seems some pages were loading all along, while others were not. Strange.
I can not see nLab for about an hour
That Guest was me! Now the n-forum is playing up. the pages are slow to come and then are sometimes unformatted. (some minutes later:That seems to have cleared for the moment.)
It’s again half past 9 CET and the Lab is unresponsive and restarting doesn’t help, just as in the previous cases above.
I am getting the impression this is a 24-hour periodic phenomenon. Some cron job or something is distracting the Lab each day after dinner (European dinner, that is).
The only cron job on the server happens at 6am, server time. It is now midnight on the nLab so that’s not the explanation.
More likely is a spike in traffic due to crawlers/search engines.
More likely is a spike in traffic due to crawlers/search engines.
Is there an easy way to give crawlers/search engines lower priority than genuine users?
and again: five past 9pm CET, and the nLab goes out of business. No restarting brings it back.
I wish those crawlers much fun with this cool website! If only I could mask as a crawler, too ;-)
21:06 CET and the question “What is the nLab?” has its daily simple answer:
its’ down.
(Or otherwise occupied.)
I take that back. One can see how it gets much slower from about 21:05 CET on, but today it still comes back every few minutes.
If only I could mask as a crawler, too ;-)
I'm sure that the crawlers are also getting very slow response times. But they are more patient than you.
(Since patience is a virtue, let me stress that you I am not charging that you are deficient in it, Urs. The nLab should not require such patience.)
The nLab has been down for me so far all day.
@Rod: Here in the UK it has been ok.
The Lab is down, and I can’t restart it currently, on my new equipment.
Restarted and it seems back again.
Thanks!!
I find it very annoying that the nLab seems to go down almost every time I work on it (for example, right now, when I was trying to edit dg-scheme and derived algebraic geometry). Forgive my ignorance, but is there anything being done about this? If not, why not? Surely I am not the only person annoyed by this?
I have some experience administering Linux servers for some small companies and I would be happy to help in any way I can. First of all, is there any compelling reason to keep using Instiki? As far as I understand, it doesn’t seem very well suited to the nLab, and there are surely much better alternatives out there.
First of all, is there any compelling reason to keep using Instiki
I think that the amount of fine tuning done on instiki for our purposes, by Jacques and Andrew is overwhelming and, if we use something else, we have to throw all this, and deal with all kinds of glitches in migration for years. Also MathJax gives worse results than MathML. So what solution for wiki containing reasonable subset of LaTeX and displaying in MathML you see, apart from Instiki ? Also have in mind that iTeX to mathMl translation is also used by nForum and that AzimuthProject also uses the same software hosted also by Andrew.
I think it would be much wiser making some further tuning, like having some services working with periodical update and not up-to-date (like recently revised pointing to shorter list updated just several times a day), making low preference in serving recognized robots (robots are, if I understood Andrew, a large percentage of usage), having passworded privileged user/contributors which should be served by higher preference, discarding generated the long list of “cited by” items at the end of each page (this generates lots of trash not used by regulars, why not having a link to special cited by report which will be used by few).
My opinion is, of course, subject to corrections by Andrew.
There are also other wikis that support LaTeX and output MathML. For example, one option I know about is gitit. This also has the advantage of actually supporting LaTeX formulas, and not just a modified and partial version like itex. It’s also written in Haskell, which is in general many times faster than Ruby (which Instiki is written in).
As for glitches in migration, it would be possible to migrate smoothly by setting up an instance of gitit, for example, and testing it for some time on some subdomain while keeping Instiki on ncatlab.org. After any kinks are ironed out we could simply point ncatlab.org to the gitit instance. (Of course, we have to create a migration script, but since gitit supports Markdown, I guess this shouldn’t be difficult.)
I would be willing to set up an instance of gitit and work on migrating the content from the nLab, if I could have an SQL dump of the relevant database tables, and if afterwards the nLab maintainers would consider evaluating it as an alternative to the current setup.
It makes sense only to develop a very good migrating script. Otherwise, if you spend months migrating good fraction of migrated pages would change in the meantime, so the manual migration does not look appealing. But what are the advantages ? What about the theorem etc. environments, floating tables of content etc. which seem essential to some users. What about links to nLab pages and specific paragraphs in nLab pages we already have in MathOverflow etc. – would those work ? Have in mind that some works on nLab are cited and that many users count on being able to update their source code manually. Also many users have personal copy of instiki where they develop things.
Of course, it makes sense to experiment and clone or partially clone. If there is another wiki which somewhat overlaps one can have versions of related material, it is likely useful for the community. I do not see a reason why not having some pages in different environments and links between the wikis. Of course, the learning curve, made many users stay away from contributing to our wiki and changes may alienate some occasional contributors. Changes can make us work for the changes and not on our subject. I have a colleague who is for couple of decades every week running some updates and resintallations, new kernels and whatever, with big words on how great is new kernel or new update of something, and it never takes him to finish some scientific usage which he installs the updates for.
What about links to nLab pages and specific paragraphs in nLab pages we already have in MathOverflow etc. – would those work ?
We could make these continue to work using Apache, if necessary. (As long as the old and new file names never conflict, and they shouldn't.)
Adeel: I notice here that gitit's TeX converter (ETA: texmath) implements primes in a way that I would consider poor, making them conflict with subscripts (f2n and f2n are fine but f′n and fn′ are broken). That's a minor annoyance … in fact, it looks like iTeX has the same problem! Still, now that I've spotted this problem in iTeX, I'm fairly confident that I can get it fixed (at least on the nLab) by talking to Jacques and Andrew. So my question is, could we have this level of control if we use gitit, possibly working with the designers of the TeX converter, possibly modifying our own version to suit us?
This also has the advantage of actually supporting LaTeX formulas, and not just a modified and partial version like itex.
Hm, texmath test page
https://github.com/jgm/texmath/tree/master/tests
features tests which suggest that the tested issues are far less versatile than what I’ve seen around with iTeX. For example, it has a special test for stackrel command, while here is the whole set of overset, underset etc. options, with possibility to shift the name on the arrow so that it is not moved with the corner position too much, arrows in various directions etc. I don’t know, but texmath translator is about the same class as iTeX, i do not see why you use “actually” supporting if it is just a tool which is designed to handle certain set of LaTeX math commands. Also an issue is if we have a support of an expert guy (here we have help of Jacques for the occasional critical issues or reasonable wishes). I hope codecogs imports work fine.
Zoran: Yes, clearly the migration should be completely automated, and I think that this should be possible. I don’t think there will be any problem creating templates for theorems, etc. As Toby says, we could use Apache redirects to make old links continue to work. Gitit supports directories, so personal wiki functionality could be reproduced by putting pages in a personal directory.
Toby: while I don’t have any personal relationship with the author of gitit, I know that he is a very experienced programmer and has been doing a lot of good work in the open source community for many years. We would have to submit bugs or feature requests through the usual issue tracker system, but I would imagine that, due to the size of the nLab, the author may be willing to give higher priority to our requests (as we would be promoting his software by using it).
i do not see why you use “actually” supporting if it is just a tool which is designed to handle certain set of LaTeX math commands
Unless I’m mistaken, it really can parse any formulas that LaTeX can. itex does not even pretend to do that.
But what are the advantages ?
The advantage that a migration might offer is that it might get the nLab to a state where it allows to display and edit pages reliably and with sensible delays, without crashing several times per week, preferably so when being edited.
Unless I’m mistaken, it really can parse any formulas that LaTeX can. itex does not even pretend to do that.
Try to find out the citation. A formulation of the intended scope would be quite important to know.
I added the gitit link and the link to Haskell platform to our page Haskell.
the intended scope would be quite important to know
And also about an existence/experience of any hi-load, hi-traffic implementation using heavy LaTeX code at least one order of magnitude close to nLab’s parameters in that sense. There are also issues of good updating of links (say nonexisting to existing etc. in refering pages) and monitoring overall updates and history. How much is gitit tested for heavy LaTeX, interlinking etc. users, even offline ?
Also, a migration should keep the history from nLab and be able to rollback to a version which corresponds to source which migrate-mirrors some pre-migration old version in nLab. We should not freeze possible bad versions which accidentaly happen at migration point in time, nor erase credit to earlier contributors. At least for the purposes of suggesting discussion (saying that one can go to old version of nLab for that is not satisfactory, as future users will certainly not go to premigration web for that, nor will accidental users).
Zoran, I am not sure what you are trying to get at. If Adeel can try an alternative, let him try it, by all means. You are speaking as if we had much to lose.
397
If Adeel can try an alternative, let him try it, by all means.
I said the same as you, before, in 388:
Of course, it makes sense to experiment and clone or partially clone.
I think that it is wise to tell him all things which are important to have in mind in advance. When somebody writing or adapting a software, then having a true scope of project in mind is crucial to make good choices, good tests etc. I had spent most of my programming time in one 2-year project which could have been done in half a time (Edit: if I think better, one fourth of time maximally) if I had anticipated problems in early phase.
It also is important FOR US to know what to anticipate and what to commit to. Before my question in 386 we did not even know that gitit is the option Adeel had in mind. I was installing myself instiki on my own computer without much trying and then when I reinstalled new version of ubuntu, I had fatal glitches in instiki installation. I was thinking to redo it next month, and now, if I know that somebody considers nLab migration in few months, then I think, maybe I should not go forward but to play with gitit instead.
You are speaking as if we had much to lose.
If things are done uncarefully, we will spend more time for worse effect than if we do it in informed way. Not communicating technical issues as soon as we are aware of them is a way to loose, time at least. I agree.
I was installing myself
I am also having in mind the planned installation of wiki for math education in Croatian. We might get help and even some finance for it (and input from another database, which I was coworking on), so knowing of technical side of future plans is quite vital for wise spending of my future time. I think you should understand that nLab is not the only database which is influenced by the content and experience of nLab. This is one of the first large size projects in math wiki collaboration, hence a possible role model, specially for those who are already contributor or heavy users.
It’s always nice to get offers of help. But it’s nicer to get offers of the form “I’d like to help, I can do XYZ, what can I do?” than “You’re doing it all wrong. You should be doing ABC.”.
The major factor in the stability and speed of the nLab is the server. It would be nice to run the nLab on a better server. That’ll cost more. Currently, the nLab is funded by me.
The next factor is handling the requests a little more intelligently. Robots do overwhelm the nLab at times, and it would be nice to do a bit of load balancing to ensure that this does not block regular users.
Next, there’s the database. We’re using MySQL. I’m no expert, but I think that using another database could speed it up another notch.
Only then do we get to instiki. There are several areas where it could be improved and Jacques has identified them. Working on them would speed things up without the necessity of migration.
The idea of running the nLab on Haskell is tempting, but mainly for the “It’s Haskell!” effect. Other than that, I have no particular desire to switch from ruby to haskell. If the language of the application really were the main effect, we would compile everything to binary and throw out scripting languages. So without everything else fixed, this feels like work for work’s sake.
I took a quick look at texmath. The statement “This also has the advantage of actually supporting LaTeX formulas, and not just a modified and partial version like itex” is false. It supports a modified and partial version. Moreover, I consider \newcommand
a daft idea in general but particularly for a collaborative wiki. If you really want to use actual LaTeX, use my conversion class.
If anyone wants to experiment, of course they can and I’ll help in any way I can. But I am not going to hand over dumps of the database. There is a BZR repository of the nLab which can be used. This contains no private data but can be used to reconstruct the nLab. Because the nLab is so large, anyone who wants this should get the “seed” repository from me rather than pulling the entire lot in one go from the server. Then keeping it up to date is a light load on the server.