## Not signed in

Want to take part in these discussions? Sign in if you have an account, or apply for one below

## Site Tag Cloud

Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.

• CommentRowNumber1.
• CommentAuthorDmitri Pavlov
• CommentTimeAug 2nd 2014

I would like to keep a local copy of the source of all articles of nLab.

The reasons for this are (a) the nLab is often down; (b) even if the nLab is up, the server often takes a lot of time to respond; (c) once the article is retrieved, the math takes a lot of time to render, especially for long articles, and it looks ugly.

To mitigate these problems, I would like to keep my own local copy of the nLab, compiled using TeX into a DVI file, which can then be viewed locally without any of the above problems, with all the hypertext infrastructure (e.g., the hyperlinks to other nLab articles) available.

The TeX files produced by the nLab server seem to have some problems, e.g., the double-bracketed links to other nLab articles are not processed at all, which forces me to process the source files myself, either by hacking the Instiki source or by writing my own converter to TeX.

The above approach requires one to maintain a local copy of the source of all articles. I consulted the wiki (http://ncatlab.org/nlab/show/HowTo#download) and the currently recommended approach seems to generate the list of all articles and then download each one manually.

This is feasible for the initial download, even though the /nlab/source/ files serve the source text with some HTML boilerplate (completely useless in my opinion), which must be manually removed (unlike the TeX files, which are served as text/plain), but totally inappropriate for keeping the database updated, because all pages (including /nlab/source/ ones) are dynamically generated, which prevents one from using HTTP mechanisms such as If-Modified-Since that would allow one to download only the modified pages, although even then one must issue an individual HTTP request for every single page on the nLab, which by itself takes a lot of time.

Much more efficient ways exist to synchronize directories with many files, for example, rsync, which can produce a list of modified parts of modified files and retrieve them in a few seconds. This also puts much less stress on the web server than the other methods, e.g., the one cited above.

Would it be too much to ask that rsync is enabled (e.g., using rsyncd) on the nLab web server (in a read-only mode, of course)?

• CommentRowNumber2.
• CommentTimeAug 2nd 2014
• (edited Aug 2nd 2014)

There is a bzr repository, containing the full sources and all revisions of all nLab pages, that can probably be used for this purpose. I will give you some instructions here later today or tomorrow.

• CommentRowNumber3.
• CommentAuthorDmitri Pavlov
• CommentTimeAug 2nd 2014

Great, many thanks! Any decent RCS should have efficiency close to that of rsync, I guess.

• CommentRowNumber4.
• CommentTimeAug 4th 2014

It turns out there was an old page here with instructions on how to do this. I’ve updated it, let me know if you have any problems following the instructions. (The seed file is actually still uploading at the moment, but it should be available in a couple hours.)

• CommentRowNumber5.
• CommentAuthorDmitri Pavlov
• CommentTimeAug 5th 2014
• (edited Aug 5th 2014)

Excellent, many thanks for your help!

Everything works fine, bzr pull was successful.

• CommentRowNumber6.
• CommentAuthorUrs
• CommentTimeAug 10th 2014

It would generally be good if a few people keep full backups of the full nLab sources locally. As a safety measure. Should I disappear for good, I hope that after a while somebody with such a source will set up the nLab again.

• CommentRowNumber7.
• CommentAuthorTodd_Trimble
• CommentTimeAug 10th 2014

Should I disappear for good

Do you mean this in the sense that you (like anyone else) might die on any given day, or do you mean that you are seriously considering disappearing from the nLab?

You’ve been missed recently.

• CommentRowNumber8.
• CommentAuthorspitters
• CommentTimeAug 10th 2014

Indeed, your message seems a bit worriesome. What is the current back-up policy? Is the nlab protected against a hard-disk crash?

• CommentRowNumber9.
• CommentAuthorUrs
• CommentTimeAug 10th 2014
• (edited Aug 10th 2014)

I am not going to give up on the nLab voluntarily. But who knows what will happen. I suppose the server provider also provides backups, but what if we lose contact to them, either because they go out of business, or because I lose my life or my mind or, worse, my credit card.

As with DNA, the way to preserve volatile data through the millennia us to keep making distributed copies.

• CommentRowNumber10.
• CommentAuthorspitters
• CommentTimeAug 11th 2014

So, how big is it? It should be possible to find cloud back-up storage somewhere.

• CommentRowNumber11.
• CommentTimeAug 11th 2014

As a bzr repository the data is about 600 MB (about 500 MB archived).

• CommentRowNumber12.
• CommentAuthorzskoda
• CommentTimeAug 11th 2014
• (edited Aug 11th 2014)

full nLab sources

Full means that it contains also the data on history versions of the pages, or just the current full version ?

• CommentRowNumber13.
• CommentTimeAug 11th 2014

@12: it contains all the historical revisions, and all metadata.

• CommentRowNumber14.
• CommentAuthorspitters
• CommentTimeAug 11th 2014

Maybe backup to github? https://stackoverflow.com/questions/12019834/pushing-from-bazaar-to-github

Just syncing with, say, Spideroak is another option. They provide 2GB for free.

• CommentRowNumber15.
• CommentAuthorAndrew Stacey
• CommentTimeAug 11th 2014

It’s bzr only for historical reasons. It wouldn’t be impossible to convert it completely to git and then to push to github. For security, the nLab should have its own github account (since it would need an SSH key with empty passphrase to do the push). Alternatively, there’s Launchpad which is the bazaar version of github.

The repository contains as much of the information as I could extract from the database without compromising personal data (ie web passwords). It is possible to reconstruct the nLab from the repository (and I have a script that does exactly that). Hmm, if the nLab had a github account then I could dump all these scripts that I’ve written over the years there as well which would be as good a way of handing them on as any.

• CommentRowNumber16.
• CommentAuthorUrs
• CommentTimeAug 11th 2014

Hi Bas,

thanks for the suggestions. Wanna go ahead and lend a hand?

• CommentRowNumber17.
• CommentAuthorspitters
• CommentTimeAug 11th 2014

I’d be happy to help a bit. Should we wait for the media wiki experiment? I’ve heard good things about mediawiki git integration. E.g. this seems simple enough.

http://tech.tiefpunkt.com/2013/01/pushing-mediawiki-powered-wikis-to-github/

Is the nlab server running linux?

• CommentRowNumber18.
• CommentAuthorUrs
• CommentTimeAug 11th 2014

Many things one could try to do eventually. But for the moment, do you think you could, using what has been discussed above in this thread here, write a script which regularly downloads the nLab bzr respository and stores a copy in one of the places which you have been suggesting?

• CommentRowNumber19.