# Start a new discussion

## Not signed in

Want to take part in these discussions? Sign in if you have an account, or apply for one below

## Site Tag Cloud

Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.

• CommentRowNumber1.
• CommentAuthorRichard Williamson
• CommentTimeSep 15th 2018
• (edited Sep 15th 2018)

The single most important thing for the protection of the nLab against a ’disaster’ is that we have regular backups of the database in different locations. Nowadays a lot of systems would do this in the cloud, but we can achieve something more or less as good if people are willing to volunteer to regularly download a backup.

An item on the Technical TODO list (nlabmeta), 20 currently, has been to make this possible technically. Prompted by the fact that Jake is making preparations for working on the nLab frontend and needed this, I have now done this (there is now a password protected endpoint one can call to generate an SQL dump and download it).

My question is: would some people be willing to make the downloads regularly (as many as possible!)? And what would be most convenient for you? A user interface where one clicks a button? A script which you run on your own machine? Something like a cron job which runs the script once a day, or week, or whatever on your own machine?

• CommentRowNumber2.
• CommentAuthorUrs
• CommentTimeSep 15th 2018
• (edited Sep 15th 2018)

I bet there are many people out there, not all of them vocal here, who will be interested in making backups to their private machines. I am thinking of people like Jake, who are not active editors but active readers of the $n$Lab. So this should be very worthwhile.

For somebody like me an interface with a button would be most convenient. People more like you and Jake probably would prefer something more fancy. But most important is that there is any mechanism at all, so if in doubt of what to set up, I’d suggest you start with the option that is easiest for you to set up!

• CommentRowNumber3.
• CommentAuthorRichard Williamson
• CommentTimeSep 15th 2018
• (edited Sep 15th 2018)

Hehe, what I have already made is enough for myself and Jake :-). Yes, I think we should definitely have a user interface with the click option, also it can show when a backup was last made. The disadvantage, though, is that it is manual. Whereas one could have something that runs every day, say, on your computer, and tries to fetch a backup, without you needing to do or remember anything. Let me know if that would be of interest (no problem if not). I could make it a mobile app instead, but I guess one typically does not want 2GB downloading to one’s mobile!

• CommentRowNumber4.
• CommentAuthorUrs
• CommentTimeSep 15th 2018

Sure, I’d be interested in having an automatic backup to my machine! Hopefully the job of convincing my machine to do that can be reduced to one or two button clicks, though? Say to hit “download”, then “execute”, then done? :-)

• CommentRowNumber5.
• CommentAuthorDmitri Pavlov
• CommentTimeSep 15th 2018

What happened to https://github.com/ncatlab/nlab-content? It used to be a very convenient way to make backups, but the last commit was on April 16, 2018.

1. I’ve not looked into it yet, but my first guess would be simply that there is too much content for github.

But in any case there is a big difference between nlab-content and the database, there is all kinds of crucial data in the latter. But I do intend to provide exactly the same functionality as for SQL dumps for downloading rendered content.

• CommentRowNumber7.
• CommentAuthorspitters
• CommentTimeSep 15th 2018

In earlier discussion the idea was to simply ask the github staff for more space.

• CommentRowNumber8.
• CommentAuthorRichard Williamson
• CommentTimeSep 15th 2018
• (edited Sep 15th 2018)

Thanks Bas! My feeling, though, is that github is not really the appropriate place for this (they discourage SQL dumps, for instance, and this is not much different). If we host an endpoint on our own server for downloading the rendered content, we achieve more or less the same when it comes to quick recovery of current content, as long as there are people downloading regularly, of course. In other words, storing the database dumps allows us to recover everything, whilst storing rendered content complements this by allowing us to recover more quickly to something usable.

• CommentRowNumber9.
• CommentAuthorMike Shulman
• CommentTimeSep 17th 2018

I’d be happy to do this. I’d want an automated script that does it nightly. But I can set up cron on my own, and write my own shell scripts, so the easiest interface for me would be something like a single URL that my cron job could hit with a wget. But if you want to write a simple script that does something more complicated (maybe it would be best to keep multiple backups around at once, in case stuff gets corrupted and recent backups inherit the corruption), I could run that in my cron too.

About how big would a single backup be?

• CommentRowNumber10.
• CommentAuthorRichard Williamson
• CommentTimeSep 17th 2018
• (edited Sep 17th 2018)

I’d be happy to do this.

Fantastic!

The easiest interface for me would be something like a single URL that my cron job could hit with a wget.

We more or less have this now, except that there are two URLs involved (one a sub-URL of the other), and one of them is a POST, not a GET. But if you are on a unix machine, you will have almost certainly have the command line tool curl, which is just as easy to run as wget, so you can achieve everything in a two-line script. I can provide you with the curl syntax. [Edit: actually you should be able to use two wget commands instead of two curl commands if you prefer!]

I will create the credentials for you soon (this evening, maybe), and then you can just try it out.

But if you want to write a simple script that does something more complicated (maybe it would be best to keep multiple backups around at once, in case stuff gets corrupted and recent backups inherit the corruption)

For now I was thinking just of the simplest possible thing that simply generates and downloads the SQL dump, over-riding the old one. But you’re right that it would be a good idea in the end to keep a few hanging around (e.g. one that updates only per week/month) rather than just one. Just one which is over-ridden is much better than the current situation, though, and should protect us from many/most scenarios.

About how big would a single backup be?

At the moment it’s 2.3GB. If that size is problematic for people, we can look into a more complicated (but also more risk-prone) alternative (e.g. something which does a diff between a particular SQL dump and the current database, and makes only those changes to the previous SQL dump).

• CommentRowNumber11.
• CommentAuthorAli Caglayan
• CommentTimeSep 18th 2018
Richard what about having a compressed dump? You could dump and compress weekly so many more people could download backups?
2. I’ve finally created a user for Mike and given him instructions on how to do it. I am running a daily cron job to backup myself, I hope Mike will be able to do something similar. I am for the moment not overriding each day, I am managing it manually.

I’ll see if the process goes smoothly for Mike, and then I can pass on instructions to Urs and anybody else who would be willing to do this.

Re #11: Good ideas! But I’m a little short of time at the moment, so I’ll keep the current simple design for now. I am happy to create a user for anybody willing, though, so that they can do it; it is not difficult.

• CommentRowNumber13.
• CommentAuthorMike Shulman
• CommentTimeJan 10th 2019

Thanks Richard! The script you sent me works perfectly; I am setting up a cron job now. I also added a line to the script that gzips the dump once downloaded, which reduces the size down to ~600M. I think I will also set up a cron job to delete sufficiently old backups, since I don’t want to run out of disk space.

3. Excellent! Thanks very much! Could you possibly paste your scripts here (minus authentication header)? I think others might like that behaviour too.

• CommentRowNumber15.
• CommentAuthorMike Shulman
• CommentTimeJan 10th 2019

The basic daily backup.sh script is a slight modification of the one you sent me:

SQL_DUMP_ID=$(curl -X POST -H "Authorization: Basic ((AUTH))" https://ncatlab.org/sqldump) OUT_FILE=$HOME/nlab-backups/daily/$(date +"%Y-%m-%dT%H:%M:%S").sql curl -H "Authorization: Basic ((AUTH))" https://ncatlab.org/sqldump/$SQL_DUMP_ID > $OUT_FILE gzip$OUT_FILE


where ((AUTH)) should be replaced with the personal authentication hash. The next script link-to.sh can be run weekly, monthly, etc. with a commandline parameter to hardlink the most recent daily backup to a weekly or monthly directory of longer-ago backups:

FILE=$(ls -t$HOME/nlab-backups/daily | head -1)
ln $HOME/nlab-backups/daily/$FILE $HOME/nlab-backups/$1


Finally the daily delete-old.sh script deletes all but the most recent few of each backup:

find $HOME/nlab-backups/daily/ -mtime +7 -exec rm {} \; find$HOME/nlab-backups/weekly/ -mtime +56 -exec rm {} \;
find $HOME/nlab-backups/monthly/ -mtime +365 -exec rm {} \;  The idea is that I want to keep some backups from up to a year ago, but those don’t need the finer-grained day-by-day nature of backups from the past week. Here’s my crontab: 0 2 * * * /home/shulman/nlab-backups/backup.sh >/dev/null 2>&1 20 2 * * 0 /home/shulman/nlab-backups/link-to.sh weekly >/dev/null 2>&1 40 2 1 * * /home/shulman/nlab-backups/link-to.sh monthly >/dev/null 2>&1 0 3 * * * /home/shulman/nlab-backups/delete-old.sh >/dev/null 2>&1  I tested all this as much as I could, but there could still be bugs in it since not even one day has actually gone by. (-: All the scripts should begin with #!/usr/bin/bash of course, but the markdown parser here has trouble with that. (-: • CommentRowNumber16. • CommentAuthorRichard Williamson • CommentTimeJan 13th 2019 • (edited Jan 13th 2019) Great, thanks very much! I can see from the logs that your cron job is downloading an SQL dump once a day. That’s really good. I’ll create a user for Urs as well when I get a chance. • CommentRowNumber17. • CommentAuthorRichard Williamson • CommentTimeJan 22nd 2019 • (edited Jan 22nd 2019) Have created a user for Urs now, and sent instructions over email. I offered him your help as well as mine, Mike, with setting up a cron job :-). If anybody else would like to help out, just let me know. The more the merrier: the more people we have with cron jobs running at different times of day, the greater chance we have of losing as little data as possible in a ’crisis’. • CommentRowNumber18. • CommentAuthorDmitri Pavlov • CommentTimeOct 19th 2019 Can I also have an authentication token to download SQL dumps? 4. Hi Dmitri, I have sent this to you over email now. • CommentRowNumber20. • CommentAuthorOwen Maresh • CommentTimeNov 12th 2019 Just joined. Could I have an authentication token for SQL dumps, please? 5. Hi Owen, yes, you are welcome! Could you let me know an email address to which I can send the necessary info, and then I’ll do so as soon as I can? • CommentRowNumber22. • CommentAuthorOwen Maresh • CommentTimeNov 19th 2019 Richard: I sent email, but it might have been trapped by your spam filter. 6. Hi Owen, apologies for taking an eternity to respond, I have created a user for you now and sent the details of how to make a backup over email to you. • CommentRowNumber24. • CommentAuthorBen Johnsrude • CommentTimeDec 20th 2019 Hello! As per HowTo, I'd like to download rendered nLab pages for local viewing. How might I go about this? Thank you! 7. I think that when I wrote that (if indeed it was me who wrote it!) I had intended to make that functionality available in a similar way as to the database backup functionality. I have not in fact written that API yet, but it is fairly trivial; I will try to get to it as soon as I can. Would you prefer to specify individual pages to download, or to download everything? • CommentRowNumber26. • CommentAuthorBen Johnsrude • CommentTimeDec 21st 2019 Hi Richard! I'd prefer to download everything if possible. • CommentRowNumber27. • CommentAuthoredef • CommentTimeNov 1st 2020 Hi! I'm interested in downloading SQL dumps of the nLab. Are there any restrictions on making them available to others? • CommentRowNumber28. • CommentAuthorRichard Williamson • CommentTimeNov 3rd 2020 • (edited Nov 3rd 2020) Hi, there are no restrictions in general, but it would be good to not create too many SQL dumps, e.g. once a day or less frequently would be reasonable. Do you have a particular purpose in mind? For those who are downloading the dumps mostly to protect the nLab against disaster, all tables are downloaded, but I can exclude user data in your case (there is nothing very sensitive (passwords are hashed), but it does contain email addresses, which shouldn’t end up everywhere to avoid spam, etc.). There is an email address attached to your nForum account; can I send a username and password to this? • CommentRowNumber29. • CommentAuthorDmitri Pavlov • CommentTimeJun 28th 2021 https://ncatlab.org/sqldump appears to give 504 Gateway Time-out when I try to refresh my local copy of the nLab. 8. Hi Dmitri, I have tweaked something now, and think it may have solved the issue. Please check :-). • CommentRowNumber31. • CommentAuthorUrs • CommentTimeJun 29th 2021 Incidentally, there has recently (maybe last three weeks, not sure) been an issue that, sometimes, submitting an edit just brings back the edit page, and in that case sometimes the edit has been saved nevertheless, sometimes it has not. I haven’t been able to reproduce this in a minimal example. But when it happens, so far, another attempt to save will either actually save it or make the system admit that the edit has already been saved, so it’s not a big problem. Or was, in case this is related to the tweak you just made. 9. Thanks for letting me know, sounds a bit strange. If you are able to re-produce it, do let me know. It should not be related to the change I just made, the SQL dump mechanism is on a different application server. The summer holiday is approaching in Europe, and this is probably the best chance I will have this year to try to carry out the migration to the cloud, which should be a fresh start with regard to the kind of issue you describe. • CommentRowNumber33. • CommentAuthorDmitri Pavlov • CommentTimeJun 29th 2021 Now I get a new error: 524 Origin Time-out • CommentRowNumber34. • CommentAuthorRichard Williamson • CommentTimeJun 29th 2021 • (edited Jun 29th 2021) Hmm, thanks for letting me know, it worked for me earlier, but I’ll try to reproduce it. Might not be until tomorrow unfortunately. • CommentRowNumber35. • CommentAuthorRichard Williamson • CommentTimeJul 1st 2021 • (edited Jul 1st 2021) Hi Dmitri, I tried again now and couldn’t reproduce it (it worked). There might, however, be some timeout that is kicking in for you and just about not for me. Could you try running the following script manually, with the correct authentication token, and post the output? I particularly wish to see how long the first of the two commands takes. #!/usr/bin/bash SQL_DUMP_ID=$(curl -X POST -H "Authorization: Basic TODO" https://ncatlab.org/sqldump)

curl -H "Authorization: Basic TODO" https://ncatlab.org/sqldump/$SQL_DUMP_ID > /tmp/$(date +"%Y-%m-%dT%H:%M:%S").sql

• CommentRowNumber36.
• CommentAuthorDmitri Pavlov
• CommentTimeJul 1st 2021

Yes, I retried it and it now works (the database downloaded correctly).

The problem was with the first command, the error messages I reproduced above were caused by it, not by the second command.

10. OK, great. As I mentioned, we might be on the boundary of a time out; if the issue recurs again, if you can provide the information that I asked for in #35, that would be very helpful.

• CommentRowNumber38.
• CommentAuthoredef
• CommentTimeSep 14th 2021

I appear to have dropped the ball on this amidst the general chaos of this era, but I’m still interested in getting hold of a dump.

Hi, there are no restrictions in general, but it would be good to not create too many SQL dumps, e.g. once a day or less frequently would be reasonable.

I’d definitely be updating my copy daily or less, quite likely closer to weekly.

Do you have a particular purpose in mind?

I’m initially interested in making nLab browseable from my reMarkable e-paper tablet, so that I can look up concepts while reading papers or taking notes. Depending on their size (and given the lack of restrictions), I would be interested in mirroring them for others to have access to the raw page data frely.

For those who are downloading the dumps mostly to protect the nLab against disaster, all tables are downloaded, but I can exclude user data in your case (there is nothing very sensitive (passwords are hashed), but it does contain email addresses, which shouldn’t end up everywhere to avoid spam, etc.).

I’m quite happy to do without those.

Yes, that works :)

• CommentRowNumber39.
• CommentAuthorGuest
• CommentTimeSep 27th 2021
Leo: Hello Richard, I was wondering if you could help me with downloading the entire nlab as rendered web pages, because I live in an area where it takes long times for nlab to load and is not always available, so it would be great if I could access it offline.
Thanks.
• CommentRowNumber40.
• CommentAuthorGuest
• CommentTimeMay 26th 2022
Hey anyone if this is still ongoing could I have some help? I am keen in helping archive nlab too. You may contact me at contact@tchlabs.net

- Chien Hao
• CommentRowNumber41.
• CommentAuthorUrs
• CommentTimeMay 27th 2022
• (edited May 27th 2022)

Thanks. In case they haven’t seen this yet, I am now forwarding your message to the technical board. Also, we might need to post a general update on the matter.