Want to take part in these discussions? Sign in if you have an account, or apply for one below
Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.
But it’s really not restricted to single entries. While typing this here I am waiting for a minute already for AdS-CFT correspondence to show up.
And now that I am done typing this it is still not there…
Does it happen when you just view a page or mainly after saving one.
It could well be that it happens after saving one. It’s hard to tell, because often I repeatedly save and view. I’ll try to pay attention to that and then report back.
I am not sure if I see any correlation with anything these days. The Lab is just very slow generally.
Andrew, don’t you have the same experience? Is something different on your end?
I don’t use it as much as you, Urs, but it never seems slow to me these days.
Hm, interesting.
Is the Lab slow or down. I am in a hotel so it may be the speed of the connection that is at fault but other things seem to be working.
It’s not your connection. Same problem for me, for about two hours now or so, I am on a fast connection currently.
Me too, although my connection is not particularly fast now.
It’s still down. I can’t restart it anymore, there is now a password protection.
It seems to be working again.
Edit: ... but it's very sluggish.
It is still very sluggish. Is there any known reason for this?
There’s at least two bots crawling over the lab right now. That can slow things down.
Edit: make that at least four.
What are they? What bots do we typically get? (If you know.)
Do you also know if the bots usually crawl only the main nLab or also the public personal labs?
If the bots are identifyable as such, would it be hard to give them lower priority?
All morning the Lab has been sluggish. Sad thought to think that those being able to enjoy it most are robots.
Is the nLab down?
It is ‘back’.
The lab is definitely down.
I can’t even ssh in to the server so it’s a more fundamental issue than instiki (just to forestall all those clamouring for changing to mediawiki).
We are back.
Good.
I couldn’t even get to the web hosts’ page to restart the server, so it wasn’t even our little VPS that was having difficulties.
Yes, I know. It happened shortly after my last edit to stabilizer group. Right after making the last save there I tried to sftp into the server and got no response.
Does stabilizer group even exist?
Right, I meant stablilizer subgroup.
But now I have renamed the entry from stablizer subgroup to stabilizer group. That is a proper title, which also applies to the higher case.
I also prefer stabilizer group to stabilizer subgroup (and even better, just stabilizer).
A few minutes ago, I tried to save edits to presentation+of+a+category+by+generators+and+relations. Now I can get no response from the lab anywhere.
[edit: it’s back]
I’ve just restarted the web server. I couldn’t see any particular reason for it to be taking so long to respond, and it’s late here so I didn’t feel like doing any deep digging. I’ll poke around in the logs tomorrow and see if anything turns up.
The lab seems to be down again. I logged in and tried to restart it, but sudo asked me for a password which I don’t have.
but sudo asked me for a password which I don’t have.
Ah, yes. After the last major upgrade, I forgot to reinstate the sudoers file that allowed the relevant people to restart the server. The fact that it’s taken you this long to find that out speaks quite well of basic nLab stability! (I agree that there are still times when it gets sluggish, but that’s not quite the same)
Nonetheless, I should put that back in place.
The fact that it’s taken you this long to find that out speaks quite well of basic nLab stability!
Stability has been orders of magnitude better, that’s true. But I noticed that I don’t have the new password when I needed it many months ago.
I suspected that you got annoyed with how often I had restarted the server. If that is so, just tell me openly.
I suspected that you got annoyed with how often I had restarted the server.
No. What annoyed me was that I felt that sometimes the frequency with which the server had to be restarted was attributed to a failing in Instiki (and I admit that this could have been entirely in my head, except that Jacques also got that impression and complained to me a few times about it - justifiably so, since we’re a bit of a showcase for Instiki). The stability of the upgrade shows that actually it was due to a small modification that I’d made (for better monitoring, ironically) which I didn’t put back when I upgraded the system.
If the system is slow, then sometimes a restart can be warranted as it might be that some web crawler is taking up too many resources and a period off-line - which is insignificant for humans but not for spiders - can be enough to shake it off. And certainly a restart can’t harm the system - so long as we don’t have loads of people all trying to restart it at the same time. So I’ve no problem with the server being restarted from time to time. I’ll reinstate the sudo stuff.
Okay, I see.
It’s certainly true that I also took the previous behaviour to be a failing of Instiki. Glad to hear that it was not, and that you have sorted it all out!
Hi Andrew,
it’s good maybe that we had this exchange finally. Allow me to make one more comment on this here:
What annoyed me was that I felt that sometimes the frequency with which the server had to be restarted was attributed to a failing in Instiki (and I admit that this could have been entirely in my head, except that Jacques also got that impression and complained to me a few times about it
I can understand this, but maybe you and Jacques can also understand the reverse perspective: for the majority of users of the nLab, the same failing was necessarily attributed not to Instiki (of which they wouldn’t know anything) but to “the Lab”, which was the only thing they would perceive as being “down”. This is what made me, in turn, very nervous. Especially when the point was reached that the Lab was no longer a little start-up. I felt we were in big trouble. And I am not sure how I should have handled that other than pointing it out.
Anyway, I am glad that it’s been sorted out.
Right now, the lab seems to be down.
The server itself seems to be down - I can’t ssh into it at the moment. I’ll see if I can get into the VPS controls (last time this happened I couldn’t even do that - indicating that the problem was much higher up than us.)
Ah, well. Maybe one fine day we’ll have a mirror of the Lab installed…
Ah, well. Maybe one fine day we’ll have a mirror of the nLab installed…
Maybe it’s time to revisit that one. Once the forum migration is done.
The lab just came back! :-)
“Not our fault” is the official line (“our” = “us” = “nLab”). There was a spike on the server which made all the VPS’s that use it unaccessible for a while.
The Lab is down, it seems. Since a few minutes back.
Seems more that it’s just really, really slow. But it’s not the nLab, it’s the connection. I just loaded the home page. My browser says it took 85s from request to first byte. The nLab claims it took it only 8ms to serve that request.
Okay, I see. Yes, I can connect to it now, too.
I’m getting no response now.
That’s not really true. When requested, a page starts to come, and I do at least get its title (in the HTML sense), but nothing else that my browser will render.
Working for me now.
I’ll look at the logs to see if there’s an explanation, but it sounds like a network issue.
It works for me now too.
It seems the Lab is down, I can not reach it from IHES (was editing an entry when it broke and now it looks dead).
! it is back
Is it down. I am on a relatively slow connection but ….
Up as far as I can tell.
It is working for me now
I started having somewhat badly formatted pages last several loads (which also slowed down). I do not know if it is up to server. P.S. the bad formatting persisted just for several minutes, now it is OK.
I did not read all of the 254 posts in this thread, but what is the reason for the lab being down quite often?
The first post was about 3 years ago! So it’s not that often.
At the moment, my suspicion is that the lab gets overwhelmed by search bots from time to time and it isn’t sophisticated enough to sort out the difference between the bots and regular users. But I need to do some investigation to track this down.
It seems to be down now.
I have now restarted it.
This was the third time today. Also had to restart several times yesterday and the day before. Should I go back to announcing all restarts here in order to get statistics and thus maybe a better chance to see what’s going on?
Down again!
Still down
I’ve just updated the robots.txt
file to include a slew of things that these search bots shouldn’t be trying to access. We can see if that stops them bringing the lab to a slow crawl.
(restarted as well)
So it’s not that often.
Maybe my notion of ”down” includes cases which are excluded in your one: For instance when I wrote
I did not read all of the 254 posts in this thread, but what is the reason for the lab being down quite often?
I couldn’t call up the site for approximately two hours but maybe other users could. There were other instances (less than 5) this week where the site reacted very slow such that it was hardly usable.
I read the “I did not read all …” sentence as “It looks like there have been 254 complaints about the nlab being down, why?”.
You could start reading the most recent ones and work back from there to find out what I think is the current cause.
Just in case it helps any diagnosis:
the nLab is being sluggish all day. I have restarted once or twice when it didn’t react for minutes, but each time it comes back only to keep being very slow.
(I should quit working on it, I must not think of the time that these handful of edits are taking me…)
I have a little suspicion as to its sluggish behaviour today:
apache2# grep -c reddit other_vhosts_access.log
1685
and searching further, I found this post.
Looking a bit further at the logs, we’ve had twice as much traffic today as over the last few days: 70,000 hits as opposed to an average of about 40,000.
That probably includes counting all the extra bits that need to be loaded for a page so that isn’t number of page views. If I make the search a bit more refined, I get:
apache2# grep -c 'ncatlab.*26/Jun.*show' other_vhosts_access.log
28566
apache2# grep -c 'ncatlab.*27/Jun.*show' other_vhosts_access.log
28733
apache2# grep -c 'ncatlab.*28/Jun.*show' other_vhosts_access.log
20078
apache2# grep -c 'ncatlab.*29/Jun.*show' other_vhosts_access.log
25724
apache2# grep -c 'ncatlab.*30/Jun.*show' other_vhosts_access.log
60218
So that’s over double.
searching further, I found this post.
Ah, interesting!
It also shows that we are not making a good impression, alas.
Looking a bit further at the logs, we’ve had twice as much traffic today as over the last few days: 70,000 hits as opposed to an average of about 40,000.
Hm, so you think this is the cause? That would mean that if we reached an actual daily average of 70,000, that would be the end of it then?
Can we think again about how this might be solved in the long run? You once said that other sites don’t have this problem because they are running on stronger servers, right? (Or was it on more servers?) Is this an option for us, can we (just checking in principle) rent a “stronger server” somehow, and get rid of this problem?
As you will have seen, they were maybe more interested in evil than in category theory, even. That reminds me of another problem…
I could always redirect traffic from reddit to a single page with a message something like:
It is most gratifying that your enthusiasm for our website continues unabated. And so we would like to assure you that the guided missiles currently converging with your ship are part of a special service we extend to all of our most enthusiastic clients… And the fully armed nuclear warheads are, of course, merely a courtesy detail. We look forward to your custom in future lives. Thank you.
I’ve counted the “entry points” from reddit and yes, the vast majority of them are for the page evil.
Right. ;-)
But seriously, if we are now at the bounds of regular traffic being equivalent to a DoS attack, we need to think about what to do. Is it possible to upgrade the server power that we (you!) are currently renting?
Regular traffic includes bots doesn’t it ? I think if we introduce OPTIONAL logins as it was suggested before, then one could imagine a protocol in which the processes of a logged users are assigned higher priority than all other requests, by noncertified/anonymi and bots. How reasonable is this ?
But bots or not: if we have an average of 40,000 hits per day now and at 70,000 hits per day the system stalls, then we need to think about how to make the system stronger (if possible), instead of how to keep hits away.
Because: we are interested in this site being crawled by bots, we are interested in it getting hits from curious reddit-readers, we are interested in this site being seen, right? Otherwise why bother with it? So the long-term solution cannot be just to block users (and be they bots). We should think about how to make the system stronger, so that it can serve the requests that it gets.
What would it take?
Probably not all hits take the same time to process. I do not think that I am interested more in speed of bots update of nLab in indexing on the web than my own work. If my own work slows down by two this is a big loss for me. If the nLab page indexing is say 6 days old instead of 2, it is not a big deal. I DID not propose to block users but to give precedence to your edits over the bots. Bots will slow down a bit and you will get precedence. I think that if we introduce optional login this will be used by small fraction of users, and we should certaily not give that kind of privilege to bots.
(You are proposing microsoft-like solution. Buy stronger computer instead of solving fundamental flaws. I had no problem with stalling windows when browsing with NeXT stations built in early 1990s. Kill always worked. Now with 100 times faster MS stations we have stalls all the time, even when opening simple things like a text ssh window. Whatever be the speed, why not having hierarchy ? This is good in any complex system.)
The Lab is sluggish again. Restarting the server doesn’t change anything. In fact, even running the restart script is sluggish (the process dots appear slowly, one by one on the screen…)
If the server is also sluggish then this suggests that the slow-down is deeper than we can deal with and restarting the web server won’t do a lot.
Hi Andrew,
could you give us an impression, from your point of view, on what the options are for moving the nLab to a better server?
It’s something that I’m going to look into next year. One thing I’ll look at is the possibility of moving to a European server.
But I’d like to gather some hard data rather than just “It seems like a reasonable idea”. I want to know what we should actually prioritise and therefore what we should look for in a host.
I want to know what we should actually prioritise
Hm, okay. If you’d asked me I’d say that once the ship is sinking, it is not the time to gather more data before taking actions. But you’ll know this better.
I’d say it’s listing, not sinking.
The reddit situation was unfortunate, but was a spike that we can’t really consider as “normal” behaviour. The “per day” figures that I gave don’t give the whole story as it all occurred in the space of a few hours. Moreover, if you do a bit of searching, it isn’t just us that this happened to: detexify is another one that went down because of a reddit (or similar) spike.
The bots are annoying, but we don’t want to ban them altogether since although the public aspect of the nLab maybe secondary to some of us, it isn’t that much secondary (and to some it may not be at all). So it would be nice to figure out a way to allow some requests higher priority than others.
The other thing to remember is that our experiences, as nLab authors, are atypical. Most “users” of the nLab only want to view pages and so experience things a bit differently to those who are constantly editing pages. Now, personally, I’d prioritise the authors over the readers, but it’s worth keeping in mind that not everyone sees the same picture.
Incidentally, our “google page rank” is (unofficially) now 6/10. I don’t remember what it was last time I checked, but I don’t think it was that high. Indeed, I think that the nforum actually outranked the nlab whereas now the nforum is at 4/10. For (vague) comparison, the same site gave me stackoverflow as 7/10 and mathoverflow is also at 6/10. (I’ve no idea how reliable these figures are.)
That means we’re being noticed and linked to. That’s good. But I agree that we need to figure out how best to handle the increased traffic so I’m not taking it lightly by any means, I’m just trying to say that these are good problems to have!
(Oh and, PS: I’m only a few days away from moving for sabbatical so my ability to do stuff is somewhat limited at the moment!)
Have you considered using cloud-computing through amazon, or another service provider, to provide a back-up when the system is sluggish, or down? I am not sure if this site is funded, or if the hosting is being done on someones personal dime.
Amazon Cloud Computing Services Amazon Elastic Compute Cloud
Google Cloud Both Enterprise Level and Smaller Scale
I hope this helps.
As someone who runs a big website, I find it extremely annoying the way people just throw around “cloud computing” as a silver bullet for all manner of hosting problems. It’s not that easy to build a scalable website, and cloud computing isn’t that cheap either.
On the other hand, I personally have used “cloud computing” (via Rackspace) to solve a problem of a web site that was overloading with users and going down. It was a bit of work to make the web application “scalable”, but not bad at all. (I do agree, though, that it wasn’t that cheap — cheaper than buying actual machines, of course, but not negligible.) Of course when using an existing software like instiki rather than an in-house setup like ours was, things may be more difficult.
We should start a new thread for discussing these things. I’m particularly keen to hear of people’s actual experiences.
Just now, I can’t get to the Lab at all.
It still is down.
It’s up now.
I’m on extremely limited access at the moment. Don’t know if I can get ssh.
I noticed that it went down yesterday, but it was shortly before I had to go offline. On the other hand, experience in the last weeks shows that restarting the server doesn’t help much. (I suppose it helped at some point when the Lab still literally crashed and needed a reboot, but when, as apparently now, it goes to its knees because it can’t keep up with user requests, then restarting it probably rather makes things worse (?))
Down a few minutes ago.
Back, possibly some time ago.
Something wrong with the search ? I can download Lab pages, but every search stalls.
Search works now for me, at least.
Yes, it does, only that day it did not work.
The Lab seems to be down at the moment.
Works for me now
The lab is imitating a snail! It has been several minutes since I asked for the recently revised list. I don’t think it is actually ’down’. (Edit:In fact the home page came up about 30 secs after I typed that.)
I just restarted it. Something’s causing zombie processes at the moment. It might be an issue with the server - I’ll initiate investigations.