Not signed in (Sign In)

# Start a new discussion

## Not signed in

Want to take part in these discussions? Sign in if you have an account, or apply for one below

• Sign in using OpenID

## Site Tag Cloud

Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.

• CommentRowNumber1.
• CommentAuthorRichard Williamson
• CommentTimeSep 15th 2018
• (edited Sep 15th 2018)

The single most important thing for the protection of the nLab against a ’disaster’ is that we have regular backups of the database in different locations. Nowadays a lot of systems would do this in the cloud, but we can achieve something more or less as good if people are willing to volunteer to regularly download a backup.

An item on the Technical TODO list (nlabmeta), 20 currently, has been to make this possible technically. Prompted by the fact that Jake is making preparations for working on the nLab frontend and needed this, I have now done this (there is now a password protected endpoint one can call to generate an SQL dump and download it).

My question is: would some people be willing to make the downloads regularly (as many as possible!)? And what would be most convenient for you? A user interface where one clicks a button? A script which you run on your own machine? Something like a cron job which runs the script once a day, or week, or whatever on your own machine?

• CommentRowNumber2.
• CommentAuthorUrs
• CommentTimeSep 15th 2018
• (edited Sep 15th 2018)

Great that you bring this up. Every now and then I am getting nervous about this.

I bet there are many people out there, not all of them vocal here, who will be interested in making backups to their private machines. I am thinking of people like Jake, who are not active editors but active readers of the $n$Lab. So this should be very worthwhile.

For somebody like me an interface with a button would be most convenient. People more like you and Jake probably would prefer something more fancy. But most important is that there is any mechanism at all, so if in doubt of what to set up, I’d suggest you start with the option that is easiest for you to set up!

• CommentRowNumber3.
• CommentAuthorRichard Williamson
• CommentTimeSep 15th 2018
• (edited Sep 15th 2018)

Hehe, what I have already made is enough for myself and Jake :-). Yes, I think we should definitely have a user interface with the click option, also it can show when a backup was last made. The disadvantage, though, is that it is manual. Whereas one could have something that runs every day, say, on your computer, and tries to fetch a backup, without you needing to do or remember anything. Let me know if that would be of interest (no problem if not). I could make it a mobile app instead, but I guess one typically does not want 2GB downloading to one’s mobile!

• CommentRowNumber4.
• CommentAuthorUrs
• CommentTimeSep 15th 2018

Sure, I’d be interested in having an automatic backup to my machine! Hopefully the job of convincing my machine to do that can be reduced to one or two button clicks, though? Say to hit “download”, then “execute”, then done? :-)

• CommentRowNumber5.
• CommentAuthorDmitri Pavlov
• CommentTimeSep 15th 2018

What happened to https://github.com/ncatlab/nlab-content? It used to be a very convenient way to make backups, but the last commit was on April 16, 2018.

1. I’ve not looked into it yet, but my first guess would be simply that there is too much content for github.

But in any case there is a big difference between nlab-content and the database, there is all kinds of crucial data in the latter. But I do intend to provide exactly the same functionality as for SQL dumps for downloading rendered content.

• CommentRowNumber7.
• CommentAuthorspitters
• CommentTimeSep 15th 2018

In earlier discussion the idea was to simply ask the github staff for more space.

• CommentRowNumber8.
• CommentAuthorRichard Williamson
• CommentTimeSep 15th 2018
• (edited Sep 15th 2018)

Thanks Bas! My feeling, though, is that github is not really the appropriate place for this (they discourage SQL dumps, for instance, and this is not much different). If we host an endpoint on our own server for downloading the rendered content, we achieve more or less the same when it comes to quick recovery of current content, as long as there are people downloading regularly, of course. In other words, storing the database dumps allows us to recover everything, whilst storing rendered content complements this by allowing us to recover more quickly to something usable.

• CommentRowNumber9.
• CommentAuthorMike Shulman
• CommentTimeSep 17th 2018

I’d be happy to do this. I’d want an automated script that does it nightly. But I can set up cron on my own, and write my own shell scripts, so the easiest interface for me would be something like a single URL that my cron job could hit with a wget. But if you want to write a simple script that does something more complicated (maybe it would be best to keep multiple backups around at once, in case stuff gets corrupted and recent backups inherit the corruption), I could run that in my cron too.

About how big would a single backup be?

• CommentRowNumber10.
• CommentAuthorRichard Williamson
• CommentTimeSep 17th 2018
• (edited Sep 17th 2018)

I’d be happy to do this.

Fantastic!

The easiest interface for me would be something like a single URL that my cron job could hit with a wget.

We more or less have this now, except that there are two URLs involved (one a sub-URL of the other), and one of them is a POST, not a GET. But if you are on a unix machine, you will have almost certainly have the command line tool curl, which is just as easy to run as wget, so you can achieve everything in a two-line script. I can provide you with the curl syntax. [Edit: actually you should be able to use two wget commands instead of two curl commands if you prefer!]

I will create the credentials for you soon (this evening, maybe), and then you can just try it out.

But if you want to write a simple script that does something more complicated (maybe it would be best to keep multiple backups around at once, in case stuff gets corrupted and recent backups inherit the corruption)

For now I was thinking just of the simplest possible thing that simply generates and downloads the SQL dump, over-riding the old one. But you’re right that it would be a good idea in the end to keep a few hanging around (e.g. one that updates only per week/month) rather than just one. Just one which is over-ridden is much better than the current situation, though, and should protect us from many/most scenarios.

About how big would a single backup be?

At the moment it’s 2.3GB. If that size is problematic for people, we can look into a more complicated (but also more risk-prone) alternative (e.g. something which does a diff between a particular SQL dump and the current database, and makes only those changes to the previous SQL dump).

• CommentRowNumber11.
• CommentAuthorAli Caglayan
• CommentTimeSep 18th 2018
Richard what about having a compressed dump? You could dump and compress weekly so many more people could download backups?
2. I’ve finally created a user for Mike and given him instructions on how to do it. I am running a daily cron job to backup myself, I hope Mike will be able to do something similar. I am for the moment not overriding each day, I am managing it manually.

I’ll see if the process goes smoothly for Mike, and then I can pass on instructions to Urs and anybody else who would be willing to do this.

Re #11: Good ideas! But I’m a little short of time at the moment, so I’ll keep the current simple design for now. I am happy to create a user for anybody willing, though, so that they can do it; it is not difficult.

• CommentRowNumber13.
• CommentAuthorMike Shulman
• CommentTimeJan 10th 2019

Thanks Richard! The script you sent me works perfectly; I am setting up a cron job now. I also added a line to the script that gzips the dump once downloaded, which reduces the size down to ~600M. I think I will also set up a cron job to delete sufficiently old backups, since I don’t want to run out of disk space.

3. Excellent! Thanks very much! Could you possibly paste your scripts here (minus authentication header)? I think others might like that behaviour too.

• CommentRowNumber15.
• CommentAuthorMike Shulman
• CommentTimeJan 10th 2019

The basic daily backup.sh script is a slight modification of the one you sent me:

SQL_DUMP_ID=$(curl -X POST -H "Authorization: Basic ((AUTH))" https://ncatlab.org/sqldump) OUT_FILE=$HOME/nlab-backups/daily/$(date +"%Y-%m-%dT%H:%M:%S").sql curl -H "Authorization: Basic ((AUTH))" https://ncatlab.org/sqldump/$SQL_DUMP_ID > $OUT_FILE gzip$OUT_FILE


where ((AUTH)) should be replaced with the personal authentication hash. The next script link-to.sh can be run weekly, monthly, etc. with a commandline parameter to hardlink the most recent daily backup to a weekly or monthly directory of longer-ago backups:

FILE=$(ls -t$HOME/nlab-backups/daily | head -1)
ln $HOME/nlab-backups/daily/$FILE $HOME/nlab-backups/$1


Finally the daily delete-old.sh script deletes all but the most recent few of each backup:

find $HOME/nlab-backups/daily/ -mtime +7 -exec rm {} \; find$HOME/nlab-backups/weekly/ -mtime +56 -exec rm {} \;
find \$HOME/nlab-backups/monthly/ -mtime +365 -exec rm {} \;


The idea is that I want to keep some backups from up to a year ago, but those don’t need the finer-grained day-by-day nature of backups from the past week. Here’s my crontab:

0 2 * * * /home/shulman/nlab-backups/backup.sh >/dev/null 2>&1
20 2 * * 0 /home/shulman/nlab-backups/link-to.sh weekly >/dev/null 2>&1
40 2 1 * * /home/shulman/nlab-backups/link-to.sh monthly >/dev/null 2>&1
0 3 * * * /home/shulman/nlab-backups/delete-old.sh >/dev/null 2>&1


I tested all this as much as I could, but there could still be bugs in it since not even one day has actually gone by. (-:

All the scripts should begin with #!/usr/bin/bash of course, but the markdown parser here has trouble with that. (-:

• CommentRowNumber16.
• CommentAuthorRichard Williamson
• CommentTimeJan 13th 2019
• (edited Jan 13th 2019)

Great, thanks very much! I can see from the logs that your cron job is downloading an SQL dump once a day. That’s really good. I’ll create a user for Urs as well when I get a chance.

• CommentRowNumber17.
• CommentAuthorRichard Williamson
• CommentTimeJan 22nd 2019
• (edited Jan 22nd 2019)

Have created a user for Urs now, and sent instructions over email. I offered him your help as well as mine, Mike, with setting up a cron job :-).

If anybody else would like to help out, just let me know. The more the merrier: the more people we have with cron jobs running at different times of day, the greater chance we have of losing as little data as possible in a ’crisis’.

• CommentRowNumber18.
• CommentAuthorDmitri Pavlov
• CommentTimeOct 19th 2019
Can I also have an authentication token to download SQL dumps?
4. Hi Dmitri, I have sent this to you over email now.

• CommentRowNumber20.
• CommentAuthorOwen Maresh
• CommentTimeNov 12th 2019
Just joined. Could I have an authentication token for SQL dumps, please?
5. Hi Owen, yes, you are welcome! Could you let me know an email address to which I can send the necessary info, and then I’ll do so as soon as I can?

• CommentRowNumber22.
• CommentAuthorOwen Maresh
• CommentTimeNov 19th 2019
Richard: I sent email, but it might have been trapped by your spam filter.
6. Hi Owen, apologies for taking an eternity to respond, I have created a user for you now and sent the details of how to make a backup over email to you.

• CommentRowNumber24.
• CommentAuthorBen Johnsrude
• CommentTimeDec 20th 2019
Hello! As per HowTo, I'd like to download rendered nLab pages for local viewing. How might I go about this?

Thank you!
7. I think that when I wrote that (if indeed it was me who wrote it!) I had intended to make that functionality available in a similar way as to the database backup functionality. I have not in fact written that API yet, but it is fairly trivial; I will try to get to it as soon as I can. Would you prefer to specify individual pages to download, or to download everything?

• CommentRowNumber26.
• CommentAuthorBen Johnsrude
• CommentTimeDec 21st 2019
Hi Richard! I'd prefer to download everything if possible.
• CommentRowNumber27.
• CommentAuthoredef
• CommentTimeNov 1st 2020
Hi! I'm interested in downloading SQL dumps of the nLab. Are there any restrictions on making them available to others?
• CommentRowNumber28.
• CommentAuthorRichard Williamson
• CommentTimeNov 3rd 2020
• (edited Nov 3rd 2020)

Hi, there are no restrictions in general, but it would be good to not create too many SQL dumps, e.g. once a day or less frequently would be reasonable. Do you have a particular purpose in mind? For those who are downloading the dumps mostly to protect the nLab against disaster, all tables are downloaded, but I can exclude user data in your case (there is nothing very sensitive (passwords are hashed), but it does contain email addresses, which shouldn’t end up everywhere to avoid spam, etc.).

There is an email address attached to your nForum account; can I send a username and password to this?

Add your comments
• Please log in or leave your comment as a "guest post". If commenting as a "guest", please include your name in the message as a courtesy. Note: only certain categories allow guest posts.
• To produce a hyperlink to an nLab entry, simply put double square brackets around its name, e.g. [[category]]. To use (La)TeX mathematics in your post, make sure Markdown+Itex is selected below and put your mathematics between dollar signs as usual. Only a subset of the usual TeX math commands are accepted: see here for a list.

• (Help)