Not signed in

Want to take part in these discussions? Sign in if you have an account, or apply for one below

Site Tag Cloud

Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.

Welcome to nForum
If you want to take part in these discussions either sign in now (if you have an account), apply for one now (if you don't).

Atrium > Mathematics, Physics & Philosophy: HTML export

Bottom of Page

1 to 16 of 16

- CommentRowNumber1.
- CommentAuthorTobyBartels
- CommentTimeSep 5th 2009
- PermaLink
Author: TobyBartels
Format: MarkdownThis is still quite slow and stops anything else from happening. I'll test tonight, when nobody else is on, to see just how long it takes.

This is still quite slow and stops anything else from happening.

I'll test tonight, when nobody else is on, to see just how long it takes.
- CommentRowNumber2.
- CommentAuthorTobyBartels
- CommentTimeSep 5th 2009
- PermaLink
Author: TobyBartels
Format: MarkdownFor the record, I did `touch ~/instiki/tmp/restart.txt` shortly after 21:30 (UTC) because this had slowed things down so much. (I'm not sure that this made a difference, however.)

For the record, I did touch ~/instiki/tmp/restart.txt shortly after 21:30 (UTC) because this had slowed things down so much. (I'm not sure that this made a difference, however.)
- CommentRowNumber3.
- CommentAuthorAndrew Stacey
- CommentTimeSep 7th 2009
- PermaLink
Author: Andrew Stacey
Format: MarkdownI'd like to know if it did make a difference, because it should have done. That 'export html' is enabled in the migrated lab is due to oversight not design. The method of blocking in the old lab was at the web server level and I didn't look too closely at those files (partly as we're using different server software). I'm minded to re-enable the block since there is absolutely no reason for anyone to do an 'html export'. If you must have a local copy, consider the `wget` method described elsewhere on this forum since that does incremental copies.

I'd like to know if it did make a difference, because it should have done.

That 'export html' is enabled in the migrated lab is due to oversight not design. The method of blocking in the old lab was at the web server level and I didn't look too closely at those files (partly as we're using different server software). I'm minded to re-enable the block since there is absolutely no reason for anyone to do an 'html export'. If you must have a local copy, consider the wget method described elsewhere on this forum since that does incremental copies.
- CommentRowNumber4.
- CommentAuthorTobyBartels
- CommentTimeSep 7th 2009
- PermaLink
Author: TobyBartels
Format: Markdown>I'd like to know if it did make a difference, because it should have done. Things were still slow, but they got better a few minutes later; I don't know if they would have gotten better on their own (although I did wait a few mintues before issuing the `touch`). I agree with disabling the Export HTML, but I was in a mood to just push all of the buttons to see what would happen. (And I never got a chance to test when everybody else was away.)

I'd like to know if it did make a difference, because it should have done.

Things were still slow, but they got better a few minutes later; I don't know if they would have gotten better on their own (although I did wait a few mintues before issuing the touch).

I agree with disabling the Export HTML, but I was in a mood to just push all of the buttons to see what would happen.

(And I never got a chance to test when everybody else was away.)
- CommentRowNumber5.
- CommentAuthorAndrew Stacey
- CommentTimeSep 7th 2009
- PermaLink
Author: Andrew Stacey
Format: Markdown> I agree with disabling the Export HTML, but I was in a mood to just push all of the buttons to see what would happen. Ah, okay. That's an acceptable reason for trying. Jacques' reasons for disabling it are valid whatever hosting service we're on. Just to reiterate, doing an 'export html' requires recomputing _every_ page - all 2000 of them - and then sending them in one big batch! This takes place on the server. Doing a 'wget', even if you do it for the first time, only requires recomputing those that aren't cached and the pages are sent one by one, so no great strain on the server. I'll disable 'export html' next time I'm awake enough to figure out what I'm doing.

I agree with disabling the Export HTML, but I was in a mood to just push all of the buttons to see what would happen.

Ah, okay. That's an acceptable reason for trying. Jacques' reasons for disabling it are valid whatever hosting service we're on. Just to reiterate, doing an 'export html' requires recomputing every page - all 2000 of them - and then sending them in one big batch! This takes place on the server. Doing a 'wget', even if you do it for the first time, only requires recomputing those that aren't cached and the pages are sent one by one, so no great strain on the server.

I'll disable 'export html' next time I'm awake enough to figure out what I'm doing.
- CommentRowNumber6.
- CommentAuthorAndrew Stacey
- CommentTimeSep 18th 2009
- PermaLink
Author: Andrew Stacey
Format: MarkdownI've finally woken up enough and have disabled all the export features. Turned out that Toby wasn't the only one using this feature, someone called 'yahoo' was doing it too, and not just once either. I've also disabled the export_markdown feature. I figure that if anyone really wants the entire markdown set then they'd be better off requesting it direct from me and I can generate it straight from the database (since it needs no processing). Exporting the html can take 15 minutes to do and causes a memory spike of about 50Mb. Doing it twice in a row (possibly more) is a significant load on the server. Much better to use the `wget` method.

I've finally woken up enough and have disabled all the export features. Turned out that Toby wasn't the only one using this feature, someone called 'yahoo' was doing it too, and not just once either. I've also disabled the export_markdown feature. I figure that if anyone really wants the entire markdown set then they'd be better off requesting it direct from me and I can generate it straight from the database (since it needs no processing).

Exporting the html can take 15 minutes to do and causes a memory spike of about 50Mb. Doing it twice in a row (possibly more) is a significant load on the server. Much better to use the wget method.
- CommentRowNumber7.
- CommentAuthorTobyBartels
- CommentTimeSep 18th 2009
- PermaLink
Author: TobyBartels
Format: Markdown>I've also disabled the export_markdown feature. Wow? I've been doing this regularly, and it doesn't cause a slowdown. (Since, as you said, it needs no processing.)

I've also disabled the export_markdown feature.

Wow? I've been doing this regularly, and it doesn't cause a slowdown. (Since, as you said, it needs no processing.)
- CommentRowNumber8.
- CommentAuthorAndrew Stacey
- CommentTimeSep 18th 2009
- (edited Sep 18th 2009)
- PermaLink
Author: Andrew Stacey
Format: MarkdownI'm hoping that with our current system, slow-downs are a thing of the past. I've been slowly increasing the number of concurrent processes that can run on the system. We're now up to five, and we're using something called a 'global queue' which is the Right Sort of Queue (being British, I know all about queues). So now when you do something that takes a lot of time or memory, it simply blocks up one of the processes but all the others are free to deal with other requests. The only time it becomes a problem is when all the processes get tied up, but for that we'd need five horrible requests all in one go. There is another issue, though, which is memory usage. Instiki doesn't seem to have actual memory leaks, but certain requests cause it to ask for a largish share of memory which it then fails to hand back afterwards. Jacques and I are trying to track these down, but it's a slow process. The reason why it's good to get rid of these is because it is the amount of memory that limits the number of concurrent processes. I need to make sure that we're not going to hit our _hard_ limit since that's when things start going horribly wrong. I'd also like to keep us below our soft limit, at least most of the time. The ideal situation has us operating a little below our soft limit the vast majority of the time, and occasionally going above when we get one of these memory-intensive requests and then, this is the crucial bit, going back down again. At the moment, once we're up we stay up until something recycles the processes. I've not looked at 'export_markdown' particularly to see what its memory usage is, but although the individual pages need little processing, there's still work for Instiki to do: getting each page and stuffing them into a tar archive. It just seems a daft use of the Instiki process when a simple mysql command on the commandline (which you have access to) would do the same without tying up any resources for anyone else, and without involving the web server. Sending big files over http is daft, use scp or sftp instead since they're designed for that and are much more robust when it comes to partial downloads and the like. I'm also strongly tempted to redirect the 'atom_with_content' to 'atom_with_headlines' since I find it hard to imagine that anyone actually looks at the content in their feed reader. One gets the _whole_ page which means that you don't immediately see what's changed, moreover it's generally mangled by the reader since very few can handle MathML. Far better just to see that a page has changed and then nip over to the lab to see what the actual change was. However, whilst I don't anticipate too many complaints at disabling the exports, I fully expect a raft of complaints if I much around with the RSS feeds so I'll probably do that and then go on holiday for a week!

I'm hoping that with our current system, slow-downs are a thing of the past. I've been slowly increasing the number of concurrent processes that can run on the system. We're now up to five, and we're using something called a 'global queue' which is the Right Sort of Queue (being British, I know all about queues). So now when you do something that takes a lot of time or memory, it simply blocks up one of the processes but all the others are free to deal with other requests. The only time it becomes a problem is when all the processes get tied up, but for that we'd need five horrible requests all in one go.

There is another issue, though, which is memory usage. Instiki doesn't seem to have actual memory leaks, but certain requests cause it to ask for a largish share of memory which it then fails to hand back afterwards. Jacques and I are trying to track these down, but it's a slow process. The reason why it's good to get rid of these is because it is the amount of memory that limits the number of concurrent processes. I need to make sure that we're not going to hit our hard limit since that's when things start going horribly wrong. I'd also like to keep us below our soft limit, at least most of the time. The ideal situation has us operating a little below our soft limit the vast majority of the time, and occasionally going above when we get one of these memory-intensive requests and then, this is the crucial bit, going back down again. At the moment, once we're up we stay up until something recycles the processes.

I've not looked at 'export_markdown' particularly to see what its memory usage is, but although the individual pages need little processing, there's still work for Instiki to do: getting each page and stuffing them into a tar archive. It just seems a daft use of the Instiki process when a simple mysql command on the commandline (which you have access to) would do the same without tying up any resources for anyone else, and without involving the web server. Sending big files over http is daft, use scp or sftp instead since they're designed for that and are much more robust when it comes to partial downloads and the like.

I'm also strongly tempted to redirect the 'atom_with_content' to 'atom_with_headlines' since I find it hard to imagine that anyone actually looks at the content in their feed reader. One gets the whole page which means that you don't immediately see what's changed, moreover it's generally mangled by the reader since very few can handle MathML. Far better just to see that a page has changed and then nip over to the lab to see what the actual change was.

However, whilst I don't anticipate too many complaints at disabling the exports, I fully expect a raft of complaints if I much around with the RSS feeds so I'll probably do that and then go on holiday for a week!
- CommentRowNumber9.
- CommentAuthorTobyBartels
- CommentTimeSep 19th 2009
- PermaLink
Author: TobyBartels
Format: MarkdownI don't really like the idea that you need special permission in order to get your own copy of the entire nLab. (Of course, I probably wouldn't be complaining if the Export options had never been there!) In any case, I would appreciate it if you would tell me how to do the SQL query from the command line. I don't really know SQL (I can usually read it, but I can't write it without looking things up), and I don't know what Unix command to use.

I don't really like the idea that you need special permission in order to get your own copy of the entire nLab. (Of course, I probably wouldn't be complaining if the Export options had never been there!)

In any case, I would appreciate it if you would tell me how to do the SQL query from the command line. I don't really know SQL (I can usually read it, but I can't write it without looking things up), and I don't know what Unix command to use.
- CommentRowNumber10.
- CommentAuthorAndrew Stacey
- CommentTimeSep 28th 2009
- PermaLink
Author: Andrew Stacey
Format: Markdown> I don't really like the idea that you need special permission in order to get your own copy of the entire nLab. (Of course, I probably wouldn't be complaining if the Export options had never been there!) No, nor do I. But I consider the export_markdown option to be a daft solution to the problem (with apologies to Jacques if it was something he designed!). > In any case, I would appreciate it if you would tell me how to do the SQL query from the command line. I don't really know SQL (I can usually read it, but I can't write it without looking things up), and I don't know what Unix command to use. I'll look that up next time I'm logged in with a minute or two to spare. In the meantime, I have a suggestion for a better solution to the problem of getting a copy of the _source_ of the n-lab, which also fixes a couple of other things that I worry about. What if each page had a 'view source' option? Then you could grab the sources via a similar wget script to the html. That way, each page would be a different request to the server so you'd have to wait in line like the rest of us, and not use up loads of resources in one go. Also, you'd only have to get the changed pages rather than all of them. In addition, it would solve a problem with 'cut and paste'. If I want to copy something from one page to another then I have to _edit_ the first page, copy it, cancel the edit, paste it in the new. There's great potential for messing that up, plus it unnecessarily locks a page. Having a 'view source' option would fix that. I'd make it so that you could only 'view source' if you can 'show' the page, so on published pages of private webs you don't get to see the source. That, of course, would be easy to change, though. Thoughts?

I don't really like the idea that you need special permission in order to get your own copy of the entire nLab. (Of course, I probably wouldn't be complaining if the Export options had never been there!)

No, nor do I. But I consider the export_markdown option to be a daft solution to the problem (with apologies to Jacques if it was something he designed!).

In any case, I would appreciate it if you would tell me how to do the SQL query from the command line. I don't really know SQL (I can usually read it, but I can't write it without looking things up), and I don't know what Unix command to use.

I'll look that up next time I'm logged in with a minute or two to spare. In the meantime, I have a suggestion for a better solution to the problem of getting a copy of the source of the n-lab, which also fixes a couple of other things that I worry about.

What if each page had a 'view source' option? Then you could grab the sources via a similar wget script to the html. That way, each page would be a different request to the server so you'd have to wait in line like the rest of us, and not use up loads of resources in one go. Also, you'd only have to get the changed pages rather than all of them.

In addition, it would solve a problem with 'cut and paste'. If I want to copy something from one page to another then I have to edit the first page, copy it, cancel the edit, paste it in the new. There's great potential for messing that up, plus it unnecessarily locks a page. Having a 'view source' option would fix that.

I'd make it so that you could only 'view source' if you can 'show' the page, so on published pages of private webs you don't get to see the source. That, of course, would be easy to change, though.

Thoughts?
- CommentRowNumber11.
- CommentAuthorMike Shulman
- CommentTimeSep 28th 2009
- PermaLink
Author: Mike Shulman
Format: Markdown"View source" sounds like a nice option to me.

"View source" sounds like a nice option to me.
- CommentRowNumber12.
- CommentAuthorTobyBartels
- CommentTimeSep 28th 2009
- PermaLink
Author: TobyBartels
Format: MarkdownYes, I like that; put it next to ‘Print’ and ‘TeX’, and make it —like ‘TeX’— a simple text file.

Yes, I like that; put it next to ‘Print’ and ‘TeX’, and make it —like ‘TeX’— a simple text file.
- CommentRowNumber13.
- CommentAuthorAndrew Stacey
- CommentTimeSep 29th 2009
- PermaLink
Author: Andrew Stacey
Format: MarkdownOkay, I'll give it a whirl. This _does_ involve hacking Instiki a little so I'll try it on a test site first and see if it works. Although I'll develop it on my own machine(s), what I'll do is create a new user on mathforge, maybe 'labdwarf', for development. That'll be where we can test more major changes or features under the same system as the n-lab, but without disturbing the real thing. If anyone else is reading this and is interested in hacking instiki, let me know.

Okay, I'll give it a whirl.

This does involve hacking Instiki a little so I'll try it on a test site first and see if it works.

Although I'll develop it on my own machine(s), what I'll do is create a new user on mathforge, maybe 'labdwarf', for development. That'll be where we can test more major changes or features under the same system as the n-lab, but without disturbing the real thing.

If anyone else is reading this and is interested in hacking instiki, let me know.
- CommentRowNumber14.
- CommentAuthorAndrew Stacey
- CommentTimeSep 30th 2009
- PermaLink
Author: Andrew Stacey
Format: MarkdownJacques has pointed out to me that there are security issues with putting the source as a viewable text/plain file. In short, some versions of IE take a look to see if a plain/text file contains any HTML whereupon they try to interpret them as HTML. As the source of a page could contain raw HTML, this is a problem. One possibility is to make the page _appear_ to be the raw source but actually to be a bona fide XHTML document displaying the source, suitably escaped. Copy-and-paste ought to still work since browsers (I think) decode upon copying to the clipboard. As for downloading the source of the entire n-lab, we could add a step into the script that converted the XHTML to text and thus recovered the original source. Which led both me and Jacques to wonder: why exactly do you download the entire source of the n-lab, Toby? Another option is to have a cron job export the source from the database each day and then put that somewhere as a raw file to download. That would absolutely guarantee that it was an exact copy.

Jacques has pointed out to me that there are security issues with putting the source as a viewable text/plain file. In short, some versions of IE take a look to see if a plain/text file contains any HTML whereupon they try to interpret them as HTML. As the source of a page could contain raw HTML, this is a problem.

One possibility is to make the page appear to be the raw source but actually to be a bona fide XHTML document displaying the source, suitably escaped. Copy-and-paste ought to still work since browsers (I think) decode upon copying to the clipboard.

As for downloading the source of the entire n-lab, we could add a step into the script that converted the XHTML to text and thus recovered the original source.

Which led both me and Jacques to wonder: why exactly do you download the entire source of the n-lab, Toby?

Another option is to have a cron job export the source from the database each day and then put that somewhere as a raw file to download. That would absolutely guarantee that it was an exact copy.
- CommentRowNumber15.
- CommentAuthorTobyBartels
- CommentTimeSep 30th 2009
- PermaLink
Author: TobyBartels
Format: Markdown>In short, some versions of IE take a look to see if a plain/text file contains any HTML whereupon they try to interpret them as HTML. Are we really responsible for working around bugs in IE? (Maybe yes, as long as people keep using it.) >Which led both me and Jacques to wonder: why exactly do you download the entire source of the n-lab, Toby? I've been doing it for some time since Urs suggested that people make back-ups. I'm not so concerned about that, since that is now being handled in other ways (ways that even include past revisions), but I still think that (as you agreed): >I don't really like the idea that you need special permission in order to get your own copy of the entire nLab. Basically, if we or our successors really make a mess of it, then other people will be able to continue on their own.

In short, some versions of IE take a look to see if a plain/text file contains any HTML whereupon they try to interpret them as HTML.

Are we really responsible for working around bugs in IE? (Maybe yes, as long as people keep using it.)

Which led both me and Jacques to wonder: why exactly do you download the entire source of the n-lab, Toby?

I've been doing it for some time since Urs suggested that people make back-ups. I'm not so concerned about that, since that is now being handled in other ways (ways that even include past revisions), but I still think that (as you agreed):

I don't really like the idea that you need special permission in order to get your own copy of the entire nLab.

Basically, if we or our successors really make a mess of it, then other people will be able to continue on their own.
- CommentRowNumber16.
- CommentAuthorAndrew Stacey
- CommentTimeSep 30th 2009
- PermaLink
Author: Andrew Stacey
Format: MarkdownAh, okay. So it's less a dynamic thing than a posterity thing. That makes the singly-generated file direct from the database the best solution. I think that there's still a case to be made for having a "source view" so I'll still work on that, but will deal with the export a different way.

Ah, okay. So it's less a dynamic thing than a posterity thing. That makes the singly-generated file direct from the database the best solution.

I think that there's still a case to be made for having a "source view" so I'll still work on that, but will deal with the export a different way.

1 to 16 of 16

nForum

Discussion Feed

Not signed in

Site Tag Cloud

Atrium > Mathematics, Physics & Philosophy: HTML export