Want to take part in these discussions? Sign in if you have an account, or apply for one below
Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.
The Lab is down and I can’t bring it back: I have sent the usual restart command, several times, but with no effect
It has come back to us.
have had to restart it again
restarted
I have restarted it.
The Lab is responding to everything with a 500 Application Error.
I tried to view the page enriched bicategory, and it sent me to the edit page with a Ruby error message at the top that I’d never seen before (I wish I’d saved it now). I cancelled the edit, but it kept telling me the page was locked (by me), and soon it started giving the aforementioned 500 errors. I don’t know what’s going on, but I presume it needs restarting.
Okay, I’ve no idea what is going on there. I restarted the lab, but that didn’t seem to make much difference. However, after restarting then the error messages were complaining about cookies, so I cleared out my ncatlab.org cookies and that seemed to get things back to normal for me. But that particular page still goes bananas. But it only seems to be that page which does it.
I’m going to email Jacques about it to see if he has any ideas.
Some details, in case they help:
I’ve just tried removing those cookies, and the same thing happened for me as for you. I went back to enriched bicategory (to see if anything was amiss in the source) and the whole business started again. It seems clearing the instiki_session
cookie gets things back to normal. I found I could edit a different page (I picked co-Yoneda lemma at random) and cancel the edit without any problems.
If I go to enriched bicategory I get the Ruby error message, and I’m told that I have the page locked. I’ve tried both ’edit anyway’ and ’cancel edit’ and both seem to result in the 500 errors. If I click neither then nothing goes wrong.
This is the error message I get when I try to access enriched bicategory:
ActionView::TemplateError (undefined method `children' for nil:NilClass) on line #15 of app/views/wiki/page.rhtml:
12: </p>
13: <%= @renderer.display_diff %>
14: <%- else -%>
15: <%= @renderer.display_content %>
16: <%- end -%>
17: </div>
18:
lib/chunks/engines.rb:74:in `new'
lib/chunks/engines.rb:74:in `mask'
lib/chunks/engines.rb:20:in `apply_to'
lib/wiki_content.rb:181:in `build_chunks'
lib/wiki_content.rb:157:in `initialize'
lib/page_renderer.rb:140:in `new'
lib/page_renderer.rb:140:in `render'
lib/page_renderer.rb:29:in `display_content'
app/views/wiki/page.rhtml:15:in `_run_rhtml_app47views47wiki47page46rhtml'
app/controllers/wiki_controller.rb:353:in `show'
passenger (2.2.15) lib/phusion_passenger/rack/request_handler.rb:92:in `process_request'
passenger (2.2.15) lib/phusion_passenger/abstract_request_handler.rb:207:in `main_loop'
passenger (2.2.15) lib/phusion_passenger/railz/application_spawner.rb:441:in `start_request_handler'
passenger (2.2.15) lib/phusion_passenger/railz/application_spawner.rb:381:in `block in handle_spawn_application'
passenger (2.2.15) lib/phusion_passenger/utils.rb:252:in `safe_fork'
passenger (2.2.15) lib/phusion_passenger/railz/application_spawner.rb:377:in `handle_spawn_application'
passenger (2.2.15) lib/phusion_passenger/abstract_server.rb:352:in `main_loop'
passenger (2.2.15) lib/phusion_passenger/abstract_server.rb:196:in `start_synchronously'
passenger (2.2.15) lib/phusion_passenger/abstract_server.rb:163:in `start'
passenger (2.2.15) lib/phusion_passenger/railz/application_spawner.rb:222:in `start'
passenger (2.2.15) lib/phusion_passenger/spawn_manager.rb:253:in `block (2 levels) in spawn_rails_application'
passenger (2.2.15) lib/phusion_passenger/abstract_server_collection.rb:126:in `lookup_or_add'
passenger (2.2.15) lib/phusion_passenger/spawn_manager.rb:247:in `block in spawn_rails_application'
passenger (2.2.15) lib/phusion_passenger/abstract_server_collection.rb:80:in `block in synchronize'
<internal:prelude>:10:in `synchronize'
passenger (2.2.15) lib/phusion_passenger/abstract_server_collection.rb:79:in `synchronize'
passenger (2.2.15) lib/phusion_passenger/spawn_manager.rb:246:in `spawn_rails_application'
passenger (2.2.15) lib/phusion_passenger/spawn_manager.rb:145:in `spawn_application'
passenger (2.2.15) lib/phusion_passenger/spawn_manager.rb:278:in `handle_spawn_application'
passenger (2.2.15) lib/phusion_passenger/abstract_server.rb:352:in `main_loop'
passenger (2.2.15) lib/phusion_passenger/abstract_server.rb:196:in `start_synchronously'
I’ve no idea why that error was being thrown up, but I’ve tracked down what was causing it and fixed it on the page. There was a segment:
* S. M. Carmody, _Cobordism Categories_ , PhD thesis, University of Cambridge,
1995.
which was causing the problem. For some unknown reason, the syntax:
* a
1.
causes Instiki to have the heebijeebies. I’m guessing that this is due to some update between April and now since in my experiments, Instiki wouldn’t allow me to save the page with the above syntax so now that syntax shouldn’t get in to the system (though if you don’t know why it won’t let you save, it could be very frustrating!) but the syntax is in the system so when it was saved then this bug can’t have been in place.
Anyway, the fix is to take out the newline:
* S. M. Carmody, _Cobordism Categories_ , PhD thesis, University of Cambridge, 1995.
which I’ve done, so enriched bicategory should now work.
(The tricky bit is that with linewrapping on, it’s not always obvious that there is a newline to be taken out.)
I’ve reported this to Jacques.
I found two more instances: at sketch and ideal completion
So that people not following this thread can know about this, I’ve posted at:
http://www.math.ntnu.no/~stacey/Mathforge/nForum/comments.php?DiscussionID=2805
restarted.
Thanks, I was just about to.
have restarted it
I can not access the nLab for last about 15 minutes.
Restarted.
It is not entirely down but it is at the boundary of bearable slowness at the moment. Need few minutes for HomePage-type downloads. It is about to stall comletely.
Just restarted it.
have restarted it
have restarted it (15 min ago or so)
restarted
restarted
Just restarted it.
restarted
it does not come back, though [edit: now it’s back]
restarted
restarted
I have still a problem.
okay, I have restarted again.
and once more
and yet once more
Looks like something contagious today…crazy here as well with the conference preparation…
restarted it (also two days ago)
But I won’t be able to look after the restarter-job much for the next 1.5 weeks. Officially I am on vacation! :-)
restarted
restarted
restarted
restarted
Incidentally, I’m back from holiday now so will be able to take over the restarting-reins. I shall also try to put aside some time to take a really good look at this in the near future.
thanks Andrew
(and I have restarted once more)
restarted
restarted
I was watching that one, having just clicked on unitization to figure out what it was. There were a couple of atom_with_content
requests which ate up a lot of resources. If these aren’t cached, they can take of the order of a minute to process. In the last 12 hours, we’ve had 150 such requests (not distinguishing between cached and not cached).
I’ve generally taken the attitude that anything that involves processing every page in the nLab should be discouraged. We can produce a daily atom feed from the bzr repository simply by dropping another of Jacques’ scripts in place.
(This should be in a new discussion …)
Ah, good! So maybe that’s the reason? That sounds like it should be comparatively easy to fix?
Continued discussion on atom_with_content
.
had to restart again
So why is it processing every page ? Why it is not done with messaging as I suggested in another thread ? Is it difficult ?
I have typed http://ncatlab.org/zoranskoda/source/my%20articles and immediately got the Internal error. Then I done http://ncatlab.org/zoranskoda/source/my+writings called from the page, and had no Internal error. What makes the difference, before I could do that routinely ?
restarted
Zoran: I think that the source view does not take into account redirects. Since my articles (zoranskoda) is a redirect to my writings (zoranskoda), the source view only works for the latter and not for the former.
So why […]
I am guessing the programmers did it the qick and easy way. Notice that (as shown on the very bottom of every nLab page) the instiki software here is marked as version number 0.19 only.
We should pay some professional programmer to boost up the code. But I don’t know how to get started with this.
I am guessing the programmers did it the qick and easy way.
I do not know about implementation details here, but abstractly the algorithm for messaging and replacements in a short list are rather simple as well. Now the way other things are implemented may interefere of course.
It seems the lab is down now.
We should pay some professional programmer to boost up the code.
While this must be done at some point for some things, I am still a bit uneasy with this as a real solution. I mean the programmer could do something like that and leave. And then in next phase it is harder to change something not having that person around unless he documents the changes exceedingly well. So it is always better to have for most of the changes somebody who is accessible more permanently, someone in house. At some points we had a couple of volunteers (like that French guy who was doing something with Joyal) here asking about how to contribute and they were advised to rather useless (or at least of secondary importance) tasks like the database of categories. One should also announce for the need with John’s Azimuth Project (see Azimuth Forum) where there is a number of engineers occasionally in the discussion and contributing and which uses the same platform.
On the other hand, maybe many could help if the overall structure of the system were clear. I mean there is several different pieces of software from databasing to Ruby, each of which has a documentation but it is not clear to me how in practice to even see each of the systems (where are the pieces located, who calls whom etc.).
I have restarted it.
Hm, didn’t help. I have restarted it again.
I mean the programmer could do something like that and leave. And then in next phase it is harder to change something not having that person around unless he documents the changes exceedingly well.
I think that’s the whole point of paying somebody: we can decide exactly what’s supposed to happen, there is then a contract which states this and if the contract is being fulfilled, there is money being paid. We can say what it is we need. If the programmer does something we cannot use and argues that it’s meant to be a feature, we can say: great, but that’s not a feature we want.
Of course it would be much better to have a dedicated volunteer. But in the absence of one, this is the next best thing that I can think of. I mean, clearly Jacques and Andrew and maybe others behind the scenes who I don’t know are dedicated volunteers who did an amazing amount of things for the project, but clearly apparently it takes probably just a bit more time now than volunteers can invest.
I mean, we should face the reality, instead of dreaming about what might happen if some volunteer came along: we have a large project here which we are investing lots of energy into, and at the same time the software is shaking and regularly crashing beneath our feet. We currently only survive with this software at all because we are turning it all off. Lots of useful features that you’d expect a wiki software to be for we have to turn off for it to work at all. We can consider ourselves lucky if it does the most basic job of a wiki for about 24 hours in a row without needing intervention: to display pages.
I am not sure about you all, but here is why I am very concerned about this situation: as time passes I have accumulated a great deal of files and other material on the nLab that is there in order to be available . This concerns all my activities, my research, my students. If I come, like today, to the nLab in the afternoon to realize that it has been down half of the day, I always think: why oh why do I leave my most precious stuff with this software that does not run?
And why am I? Because we are hoping that some day some volunteer will come and save us all? It’s a nice dream, but it seems to be a dream. Instead of dreaming on, I want to do something about it.
I could in principle just get myself into the code and re-program it myself. Eventually I could do that. But it would take me so much time to get to the point where’d be useful, that it spoils the whole point of it again.
So the next best thing is to pay somebody. I guess.
I think that’s the whole point of paying somebody: we can decide exactly what’s supposed to happen, there is then a contract which states this and if the contract is being fulfilled, there is money being paid.
You did not understand my complaint: the issues with upgrading software come much later, much later than you pay. And it is not matter fo money, but of being able to find out oneself in the software somebody else has built. Once we get a flux of temporary people nobody will have chance to really know the system. And the other complaint was please do not repeat the error and telling chance volunteers to do second order stuff, like database of categories.
I discussed behind the scene with a friend who is a professional programmer and consultant with physicist’s past to estimate the amount of work here. He told me that about a week of hard work would be only to locate the problems and to do an informed estimate for the rest. He glanced through the bulk of sourcecode of instiki and asked me about various size issues and phenomena in the problem. Like when the nLab goes down which parts of the system go down, how the RAM usage spikes when doing ciritical processes, do we suspect for memory leaks in the code, the size of relevant source code, the parts of the system, kind of machine we use etc. I could answer just a tiny fraction of that.
Maybe it would be good to have a detailed technical summary of our installation – all parts of the system with size, nature and links to firther info when available, the meta pages of nLab and instiki show just a portion of that and the big picture is not easily extracted. Then one can see who is able (even among prgrammers) to solve the bug and other principal requests (dealing with slowness of certain update processes when generating new page and so on) in reasonable amount of time.
The friend is also asking if it is possible to have a test installation of the whole system to play around during the process.
And it is not matter fo money, but of being able to find out oneself in the software somebody else has built.
There are standard methods for dealing with that. And my understanding is that instiki has been developed by several people this way all along.
please do not repeat the error and telling chance volunteers to do second order stuff
I never did that, Zoran. That “chance volunteer” which I think you mean did what he himself wanted to do. He was only asking for advice here about how to do it.
I discussed behind the scene with a friend who is a professional programmer and consultant with physicist’s past to estimate the amount of work here. He told me that about a week of hard work would be only to locate the problems and to do an informed estimate for the rest.
How much would he charge for “a week of hard work”? Just hypothetically. To get an idea of what order of magnitude of costs to expect.
restarted
had to restart again
Professionals of consultant level charge brutto (before tax) of roughly 80-85 euro per hour of effective work on complex problems. Lower level programmers do it for about third of the price, with about the same productivity for the same money and with somewhat less reliability. There are also collective programming houses which charge by the working time as measured by the office hours shared, what means that they charge you also for idle work, but those are nominally cheaper (range 15-35 euro and will charge for their slow learning of the problem). The guy whom I know is more on a hi end, very quick, very reliable, and very experienced. But I think we should first work hard on making the scheme of relevant software, isolating problems which need professional and planting strategy how will this solve our problems permanently, before hiring anybody. And Andrew will probably say how big the problems are as he sees them.
Okay, thanks, that helps. There is also still the problem that entries won’t save. I just spent 20 minutes or so just trying to make differential cohomology in a cohesive topos (schreiber) point to a new version of the file.
restarted
didn’t have an effect. restarted again.
Restarted (soft).
Restarted again; this time needed a hard restart.
restarted
(I have always been using the “hard restart” since the “soft restart” never showed any effect the first few times that I tried it. Is that important?)
The soft one is faster, and Andrew long ago suggested using it first. It has worked for me on occasion.
I have restarted it
(I forget though what Andrew said – if anything – what the difference is and whether it is important that we try the “soft” one first. Because trying that first adds yet a bit more to the things to do.)
The lab has been running stable for weeks, now, ever since Andrew updated the server.
But every now and then it still becomes very slow. Currently it is sluggish. But it is awake!
It’s down. I am not sure if I can still restart the server?
Oh, it’s back.
The down-times seem to be increasing again.
Is this maybe a hint for some long-term accumulative process that causes problems? Which was maybe reset when the server was updated recently, and is now again starting to show up?
Hard to say without detailed investigation. Sometimes webservers just do go slow. We get search bots and spiders crawling over the nLab (which is, generally, a good thing) and if those happen to coincide with when you’re working on something then it can seem that the system is slow when really it’s just doing a lot of things at once.
If you spot a pattern then there might be more to it. Although Jacques has introduced some speed-ups to the code, Maruku is still (from what I understand) the rate-determining-step. But we don’t appear to have loads of volunteers lining up to help develop its successor …
I’d also like to know what others from other parts of the world experience. It could be the the search bots are prevalent at night, but that might mean night in the USA! Which is a bit bad for us here in Europe. Also, if the predominant usage is based in Europe, maybe we should shift the server over here …
It’s unlikely to be anything accumulating internally, but it might be that when it was offline for a day then the search engines (and some humans) stopped coming and it’s taken them a while to catch up. But that’s pure speculation.
(Still, I’m pleased that we’re talking about slow downs rather than crashes. The nLab hasn’t crashed again since the reinstall.)
The nLab hasn’t crashed again since the reinstall.
Yes, it’s great indeed. I did applaud that in other threads, I think.
we don’t appear to have loads of volunteers lining up to help develop its successor …
For precisely that reason I kept bringing up the idea of paying a professional programmer for the task. What stops me from doing this right away is not so much the costs (it would be worth it) but that I am not sure how to really go about it.
But you will probably know. Suppose we can accumulate something like – what will it take? – two thousand dollars, (three, four)? I imagine we go to some software factory and say: Half of this is yours right away, the remaining half if by the end of this week you have produced a brush-up of this piece of open source software here, which is suffering from a bunch of baby sicknesses.
Would that not be a plan?
I think it would be cheaper if we got some students to do it! I am at the Norwegian University of Science and Technology after all, I ought to be able to find some who might help. I’ll do some asking around and see if there’s somewhere of suggesting projects for students to do, often they have coding projects to do as part of their degree and it may be that we’re allowed to suggest ideas.
Urs, we dop not have even the available chart of the software used, its relations, typical sizes of files, data, databases etc. to list in order for any professional to even estimate the job before signigin a contract. The size of instiki is such that it is nontrivial to extract which parts are relevant for the questionary. It is also nontrivial to have a person who knows ruby etc. and to communicate what we exactly want from them. Imagine that only the instiki source code is about 150 k lines.
Zoran,
Jacques has identified a single self-contained task that would greatly speed up the page rendering. It wouldn’t need an oversight of the whole process. I posted something about it a week or so ago, but no-one responded (here, one person thought about it over at Azimuth).
In which discussion ? Would you be so kind to quote the permalink ?
we dop not have even the available chart of the software used, its relations, typical sizes of files, data, databases etc.
I imagine there must be people who would also figure that out, for payment.
If one needs to figure out such things, this slows down by many days, what makes it much more expensive.
this slows down by many days
But this is why I am thinking about paying somebody in the first place: I could myself spend many days and sort this all out. Or anyone else here could. Since we don’t want to spend our days with this, let’s look for somebody who would do it for money.
Andrew etc. know what we have. They spend a lot of time already for things which are more difficult than citing what they know as facts. The programmers do not take a task if they do not know the size, unless you pay them by hours, in which case you can not know how much money it will take. The true estimate of general problems we have takes several days, and even in that case, we will need to help with input and precise wishes, past experience reports etc. If you take a low experience person this will not come out right. If you take an experienced it will be higher price. So why not isolate things ourselves and then we are in control of the size of expenditure etc. You are talking as if you had an unlimited fund for that and could repeat the process if it does not turn out right.
Somebody who knows the system would spend much less time in defining the features of the system. Somebody who knows the usage characteristics would spend much less in defining the features of desired solution. Somebody who is expert in programming would like to have both input and quickly write a design of the solutions and would do it quicker than others.
Urs, coudl we talk by skype on this now ? (I can not find your phone number on the web)
Zoran, it was here: http://www.math.ntnu.no/~stacey/Mathforge/nForum/comments.php?DiscussionID=2996
(and you took part in the discussion!).
May I suggest that this conversation resume there, and that this thread be used for reporting crashes/slow downs? Every time I see that there’s a new post in this discussion then I think that the lab has crashed again!
the Lab is not responding…
I didn’t do anything, but it’s working for me now.
I think it’s fairly safe to say that the lab no longer goes down. What does sometimes happen is that it gets very busy with lots of requests, and sometimes it can take so long to process one that it appears to have stalled. After a short while, the backlog clears and it’s back as normal.
After a short while,
Yes, I figured it would come back. But I thought for debugging purposes it would still be good to record when it is down. When I went offline it hadn’t been responding for 13 minutes. By the date of Toby’s reply it came back somewhere within the next 47 minutes.
Yes it is worth still recording that. My apologies for being a bit snappy.
I don’t want to be trouble… But the nLab hangs a lot. For minutes, repeatedly, many times a day. It does come back, yes, but its not much fun working with it.
Could you “keep a diary”?
This has the feel of something tying up the processes with Big Requests (such as listing all pages and so forth). But to know what to look for in the logs, I need to know roughly when to look.
Could you “keep a diary”?
I can do that.
Maybe others should do it to. Sometimes I feel a bit weird with being the only one recording non-responsiveness of the nLab. Sometimes I am thinking: maybe it’s just me with my system and you are all just shaking your heads about me, happily making rapid-fire calls to the nLab meanwhile.
It happens at intervals that I use the nLab continuously for hours, then again not at all for a long time. So me alone I won’t be able to give a good overview over what’s going on.
I will keep a lookout but in the flat that I have in Aix les Bains the connection is virtually non-existent so I can only comment during the time I am in the department in the university.
Urs (192): it's not just you. Of course I don't use the Lab anything like as much as you do, but I do find that now and again it's slow to respond, occasionally so slow that it appears to be totally unresponsive. I'll try to remember to record the time and date on the next occasion.
Okay, thanks for letting me know.
I know of one colleague who told me that he doesn’t like to look up things on the nLab “because it’s so slow”. But then, he was on the same kind of connection as I was, so I couldn’t tell if maybe the rest of the world is happy with the responsiveness
14:47 (my time), it does not respond…
14:50, it’s back
it’s back
but only for a second. My next request for a page again took 2 minutes.
Are you working on holographic principle by any chance?
Was so a few minutes ago.