Not signed in

Want to take part in these discussions? Sign in if you have an account, or apply for one below

Site Tag Cloud

Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.

Welcome to nForum
If you want to take part in these discussions either sign in now (if you have an account), apply for one now (if you don't).

nLab > nLab Technical Matters: What is the best way to download the source of all articles on nLab and keep it updated?

Bottom of Page

1 to 20 of 20

- CommentRowNumber1.
- CommentAuthorDmitri Pavlov
- CommentTimeAug 2nd 2014
- PermaLink
Author: Dmitri Pavlov
Format: MarkdownItexI would like to keep a local copy of the source of all articles of nLab. The reasons for this are (a) the nLab is often down; (b) even if the nLab is up, the server often takes a lot of time to respond; (c) once the article is retrieved, the math takes a lot of time to render, especially for long articles, and it looks ugly. To mitigate these problems, I would like to keep my own local copy of the nLab, compiled using TeX into a DVI file, which can then be viewed locally without any of the above problems, with all the hypertext infrastructure (e.g., the hyperlinks to other nLab articles) available. The TeX files produced by the nLab server seem to have some problems, e.g., the double-bracketed links to other nLab articles are not processed at all, which forces me to process the source files myself, either by hacking the Instiki source or by writing my own converter to TeX. The above approach requires one to maintain a local copy of the source of all articles. I consulted the wiki (<http://ncatlab.org/nlab/show/HowTo#download>) and the currently recommended approach seems to generate the list of all articles and then download each one manually. This is feasible for the initial download, even though the /nlab/source/ files serve the source text with some HTML boilerplate (completely useless in my opinion), which must be manually removed (unlike the TeX files, which are served as text/plain), but totally inappropriate for keeping the database updated, because all pages (including /nlab/source/ ones) are dynamically generated, which prevents one from using HTTP mechanisms such as If-Modified-Since that would allow one to download only the modified pages, although even then one must issue an individual HTTP request for every single page on the nLab, which by itself takes a lot of time. Much more efficient ways exist to synchronize directories with many files, for example, rsync, which can produce a list of modified parts of modified files and retrieve them in a few seconds. This also puts much less stress on the web server than the other methods, e.g., the one cited above. Would it be too much to ask that rsync is enabled (e.g., using rsyncd) on the nLab web server (in a read-only mode, of course)?

I would like to keep a local copy of the source of all articles of nLab.

The reasons for this are (a) the nLab is often down; (b) even if the nLab is up, the server often takes a lot of time to respond; (c) once the article is retrieved, the math takes a lot of time to render, especially for long articles, and it looks ugly.

To mitigate these problems, I would like to keep my own local copy of the nLab, compiled using TeX into a DVI file, which can then be viewed locally without any of the above problems, with all the hypertext infrastructure (e.g., the hyperlinks to other nLab articles) available.

The TeX files produced by the nLab server seem to have some problems, e.g., the double-bracketed links to other nLab articles are not processed at all, which forces me to process the source files myself, either by hacking the Instiki source or by writing my own converter to TeX.

The above approach requires one to maintain a local copy of the source of all articles. I consulted the wiki (http://ncatlab.org/nlab/show/HowTo#download) and the currently recommended approach seems to generate the list of all articles and then download each one manually.

This is feasible for the initial download, even though the /nlab/source/ files serve the source text with some HTML boilerplate (completely useless in my opinion), which must be manually removed (unlike the TeX files, which are served as text/plain), but totally inappropriate for keeping the database updated, because all pages (including /nlab/source/ ones) are dynamically generated, which prevents one from using HTTP mechanisms such as If-Modified-Since that would allow one to download only the modified pages, although even then one must issue an individual HTTP request for every single page on the nLab, which by itself takes a lot of time.

Much more efficient ways exist to synchronize directories with many files, for example, rsync, which can produce a list of modified parts of modified files and retrieve them in a few seconds. This also puts much less stress on the web server than the other methods, e.g., the one cited above.

Would it be too much to ask that rsync is enabled (e.g., using rsyncd) on the nLab web server (in a read-only mode, of course)?
- CommentRowNumber2.
- CommentAuthoradeelkh
- CommentTimeAug 2nd 2014
- (edited Aug 2nd 2014)
- PermaLink
Author: adeelkh
Format: MarkdownItexThere is a bzr repository, containing the full sources and all revisions of all nLab pages, that can probably be used for this purpose. I will give you some instructions here later today or tomorrow.

There is a bzr repository, containing the full sources and all revisions of all nLab pages, that can probably be used for this purpose. I will give you some instructions here later today or tomorrow.
- CommentRowNumber3.
- CommentAuthorDmitri Pavlov
- CommentTimeAug 2nd 2014
- PermaLink
Author: Dmitri Pavlov
Format: MarkdownItexGreat, many thanks! Any decent RCS should have efficiency close to that of rsync, I guess.

Great, many thanks! Any decent RCS should have efficiency close to that of rsync, I guess.
- CommentRowNumber4.
- CommentAuthoradeelkh
- CommentTimeAug 4th 2014
- PermaLink
Author: adeelkh
Format: MarkdownItexIt turns out there was an old page [here](http://ncatlab.org/nlabmeta/show/backups) with instructions on how to do this. I've updated it, let me know if you have any problems following the instructions. (The seed file is actually still uploading at the moment, but it should be available in a couple hours.)

It turns out there was an old page here with instructions on how to do this. I’ve updated it, let me know if you have any problems following the instructions. (The seed file is actually still uploading at the moment, but it should be available in a couple hours.)
- CommentRowNumber5.
- CommentAuthorDmitri Pavlov
- CommentTimeAug 5th 2014
- (edited Aug 5th 2014)
- PermaLink
Author: Dmitri Pavlov
Format: MarkdownItexExcellent, many thanks for your help! Everything works fine, bzr pull was successful.

Excellent, many thanks for your help!

Everything works fine, bzr pull was successful.
- CommentRowNumber6.
- CommentAuthorUrs
- CommentTimeAug 10th 2014
- PermaLink
Author: Urs
Format: MarkdownItexIt would generally be good if a few people keep full backups of the full nLab sources locally. As a safety measure. Should I disappear for good, I hope that after a while somebody with such a source will set up the nLab again.

It would generally be good if a few people keep full backups of the full nLab sources locally. As a safety measure. Should I disappear for good, I hope that after a while somebody with such a source will set up the nLab again.
- CommentRowNumber7.
- CommentAuthorTodd_Trimble
- CommentTimeAug 10th 2014
- PermaLink
Author: Todd_Trimble
Format: MarkdownItex> Should I disappear for good Do you mean this in the sense that you (like anyone else) might die on any given day, or do you mean that you are seriously considering disappearing from the nLab? You've been missed recently.

Should I disappear for good

Do you mean this in the sense that you (like anyone else) might die on any given day, or do you mean that you are seriously considering disappearing from the nLab?

You’ve been missed recently.
- CommentRowNumber8.
- CommentAuthorspitters
- CommentTimeAug 10th 2014
- PermaLink
Author: spitters
Format: MarkdownItexIndeed, your message seems a bit worriesome. What is the current back-up policy? Is the nlab protected against a hard-disk crash?

Indeed, your message seems a bit worriesome. What is the current back-up policy? Is the nlab protected against a hard-disk crash?
- CommentRowNumber9.
- CommentAuthorUrs
- CommentTimeAug 10th 2014
- (edited Aug 10th 2014)
- PermaLink
Author: Urs
Format: MarkdownItexI am not going to give up on the nLab voluntarily. But who knows what will happen. I suppose the server provider also provides backups, but what if we lose contact to them, either because they go out of business, or because I lose my life or my mind or, worse, my credit card. As with DNA, the way to preserve volatile data through the millennia us to keep making distributed copies.

I am not going to give up on the nLab voluntarily. But who knows what will happen. I suppose the server provider also provides backups, but what if we lose contact to them, either because they go out of business, or because I lose my life or my mind or, worse, my credit card.

As with DNA, the way to preserve volatile data through the millennia us to keep making distributed copies.
- CommentRowNumber10.
- CommentAuthorspitters
- CommentTimeAug 11th 2014
- PermaLink
Author: spitters
Format: MarkdownItexSo, how big is it? It should be possible to find cloud back-up storage somewhere.

So, how big is it? It should be possible to find cloud back-up storage somewhere.
- CommentRowNumber11.
- CommentAuthoradeelkh
- CommentTimeAug 11th 2014
- PermaLink
Author: adeelkh
Format: MarkdownItexAs a bzr repository the data is about 600 MB (about 500 MB archived).

As a bzr repository the data is about 600 MB (about 500 MB archived).
- CommentRowNumber12.
- CommentAuthorzskoda
- CommentTimeAug 11th 2014
- (edited Aug 11th 2014)
- PermaLink
Author: zskoda
Format: MarkdownItex> full nLab sources Full means that it contains also the data on history versions of the pages, or just the current full version ?

full nLab sources

Full means that it contains also the data on history versions of the pages, or just the current full version ?
- CommentRowNumber13.
- CommentAuthoradeelkh
- CommentTimeAug 11th 2014
- PermaLink
Author: adeelkh
Format: MarkdownItex@12: it contains all the historical revisions, and all metadata.

@12: it contains all the historical revisions, and all metadata.
- CommentRowNumber14.
- CommentAuthorspitters
- CommentTimeAug 11th 2014
- PermaLink
Author: spitters
Format: MarkdownItexMaybe backup to github? https://stackoverflow.com/questions/12019834/pushing-from-bazaar-to-github Just syncing with, say, Spideroak is another option. They provide 2GB for free.

Maybe backup to github? https://stackoverflow.com/questions/12019834/pushing-from-bazaar-to-github

Just syncing with, say, Spideroak is another option. They provide 2GB for free.
- CommentRowNumber15.
- CommentAuthorAndrew Stacey
- CommentTimeAug 11th 2014
- PermaLink
Author: Andrew Stacey
Format: MarkdownItexIt's `bzr` only for historical reasons. It wouldn't be impossible to convert it completely to `git` and then to push to github. For security, the nLab should have its own github account (since it would need an SSH key with empty passphrase to do the push). Alternatively, there's Launchpad which is the bazaar version of github. The repository contains as much of the information as I could extract from the database without compromising personal data (ie web passwords). It is possible to reconstruct the nLab from the repository (and I have a script that does exactly that). Hmm, if the nLab had a github account then I could dump all these scripts that I've written over the years there as well which would be as good a way of handing them on as any.

It’s bzr only for historical reasons. It wouldn’t be impossible to convert it completely to git and then to push to github. For security, the nLab should have its own github account (since it would need an SSH key with empty passphrase to do the push). Alternatively, there’s Launchpad which is the bazaar version of github.

The repository contains as much of the information as I could extract from the database without compromising personal data (ie web passwords). It is possible to reconstruct the nLab from the repository (and I have a script that does exactly that). Hmm, if the nLab had a github account then I could dump all these scripts that I’ve written over the years there as well which would be as good a way of handing them on as any.
- CommentRowNumber16.
- CommentAuthorUrs
- CommentTimeAug 11th 2014
- PermaLink
Author: Urs
Format: MarkdownItexHi Bas, thanks for the suggestions. Wanna go ahead and lend a hand?

Hi Bas,

thanks for the suggestions. Wanna go ahead and lend a hand?
- CommentRowNumber17.
- CommentAuthorspitters
- CommentTimeAug 11th 2014
- PermaLink
Author: spitters
Format: MarkdownItexI'd be happy to help a bit. Should we wait for the media wiki experiment? I've heard good things about mediawiki git integration. E.g. this seems simple enough. http://tech.tiefpunkt.com/2013/01/pushing-mediawiki-powered-wikis-to-github/ Is the nlab server running linux?

I’d be happy to help a bit. Should we wait for the media wiki experiment? I’ve heard good things about mediawiki git integration. E.g. this seems simple enough.

http://tech.tiefpunkt.com/2013/01/pushing-mediawiki-powered-wikis-to-github/

Is the nlab server running linux?
- CommentRowNumber18.
- CommentAuthorUrs
- CommentTimeAug 11th 2014
- PermaLink
Author: Urs
Format: MarkdownItexMany things one could try to do eventually. But for the moment, do you think you could, using what has been discussed above in this thread here, write a script which regularly downloads the nLab bzr respository and stores a copy in one of the places which you have been suggesting?

Many things one could try to do eventually. But for the moment, do you think you could, using what has been discussed above in this thread here, write a script which regularly downloads the nLab bzr respository and stores a copy in one of the places which you have been suggesting?
- CommentRowNumber19.
- CommentAuthoradeelkh
- CommentTimeAug 12th 2014
- PermaLink
Author: adeelkh
Format: MarkdownItexAndrew (#15), that sounds like a good idea. And this script that reconstructs an Instiki installation from a bzr repository seems useful to have. I've created an "organization" account on GitHub: [github.com/ncatlab](https://github.com/ncatlab). If you tell me your GitHub username I can invite you to the "team" and you can add the various useful scripts you have. (Or if you prefer, you can just e-mail them to me and I'll add them.)

Andrew (#15), that sounds like a good idea. And this script that reconstructs an Instiki installation from a bzr repository seems useful to have. I’ve created an “organization” account on GitHub: github.com/ncatlab. If you tell me your GitHub username I can invite you to the “team” and you can add the various useful scripts you have. (Or if you prefer, you can just e-mail them to me and I’ll add them.)
- CommentRowNumber20.
- CommentAuthoradeelkh
- CommentTimeAug 12th 2014
- (edited Aug 12th 2014)
- PermaLink
Author: adeelkh
Format: MarkdownItexBas (#14, 17), I am maintaining an up-to-date version of the bzr repository on two different computers. But I agree that pushing to GitHub would be a good idea as well. If I get a moment and you haven't done it yet, then I'll write a script to do that. Yes, the nLab server is running Linux.

Bas (#14, 17), I am maintaining an up-to-date version of the bzr repository on two different computers. But I agree that pushing to GitHub would be a good idea as well. If I get a moment and you haven’t done it yet, then I’ll write a script to do that. Yes, the nLab server is running Linux.

1 to 20 of 20

nForum

Discussion Feed

Not signed in

Site Tag Cloud

nLab > nLab Technical Matters: What is the best way to download the source of all articles on nLab and keep it updated?