Not signed in (Sign In)

Start a new discussion

Not signed in

Want to take part in these discussions? Sign in if you have an account, or apply for one below

  • Sign in using OpenID

Site Tag Cloud

2-categories 2-category 2-category-theory abelian-categories adjoint algebra algebraic algebraic-geometry algebraic-topology analysis analytic-geometry arithmetic arithmetic-geometry beauty bundles calculus categories category category-theory chern-weil-theory cohesion cohesive-homotopy-type-theory cohomology combinatorics complex-geometry computable-mathematics computer-science connection constructive constructive-mathematics cosmology deformation-theory descent diagrams differential differential-cohomology differential-equations differential-geometry differential-topology digraphs duality education elliptic-cohomology enriched fibration finite foundations functional-analysis functor galois-theory gauge-theory gebra geometric-quantization geometry graph graphs gravity grothendieck group-theory harmonic-analysis higher higher-algebra higher-category-theory higher-differential-geometry higher-geometry higher-lie-theory higher-topos-theory homological homological-algebra homotopy homotopy-theory homotopy-type-theory index-theory infinity integration integration-theory k-theory lie lie-theory limits linear linear-algebra locale localization logic manifolds mathematics measure-theory modal-logic model model-category-theory monad monoidal monoidal-category-theory morphism motives motivic-cohomology multicategories noncommutative noncommutative-geometry number-theory of operads operator operator-algebra order-theory pasting philosophy physics planar pro-object probability probability-theory quantization quantum quantum-field quantum-field-theory quantum-mechanics quantum-physics quantum-theory question representation representation-theory riemannian-geometry scheme schemes set set-theory sheaf simplicial space spin-geometry stable-homotopy-theory stack string-theory subobject superalgebra supergeometry svg symplectic-geometry synthetic-differential-geometry terminology theory topology topos topos-theory type type-theory universal variational-calculus

Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.

Welcome to nForum
If you want to take part in these discussions either sign in now (if you have an account), apply for one now (if you don't).
    • CommentRowNumber1.
    • CommentAuthorRichard Williamson
    • CommentTimeSep 15th 2018
    • (edited Sep 15th 2018)

    The single most important thing for the protection of the nLab against a ’disaster’ is that we have regular backups of the database in different locations. Nowadays a lot of systems would do this in the cloud, but we can achieve something more or less as good if people are willing to volunteer to regularly download a backup.

    An item on the Technical TODO list (nlabmeta), 20 currently, has been to make this possible technically. Prompted by the fact that Jake is making preparations for working on the nLab frontend and needed this, I have now done this (there is now a password protected endpoint one can call to generate an SQL dump and download it).

    My question is: would some people be willing to make the downloads regularly (as many as possible!)? And what would be most convenient for you? A user interface where one clicks a button? A script which you run on your own machine? Something like a cron job which runs the script once a day, or week, or whatever on your own machine?

    • CommentRowNumber2.
    • CommentAuthorUrs
    • CommentTimeSep 15th 2018
    • (edited Sep 15th 2018)

    Great that you bring this up. Every now and then I am getting nervous about this.

    I bet there are many people out there, not all of them vocal here, who will be interested in making backups to their private machines. I am thinking of people like Jake, who are not active editors but active readers of the nnLab. So this should be very worthwhile.

    For somebody like me an interface with a button would be most convenient. People more like you and Jake probably would prefer something more fancy. But most important is that there is any mechanism at all, so if in doubt of what to set up, I’d suggest you start with the option that is easiest for you to set up!

    • CommentRowNumber3.
    • CommentAuthorRichard Williamson
    • CommentTimeSep 15th 2018
    • (edited Sep 15th 2018)

    Hehe, what I have already made is enough for myself and Jake :-). Yes, I think we should definitely have a user interface with the click option, also it can show when a backup was last made. The disadvantage, though, is that it is manual. Whereas one could have something that runs every day, say, on your computer, and tries to fetch a backup, without you needing to do or remember anything. Let me know if that would be of interest (no problem if not). I could make it a mobile app instead, but I guess one typically does not want 2GB downloading to one’s mobile!

    • CommentRowNumber4.
    • CommentAuthorUrs
    • CommentTimeSep 15th 2018

    Sure, I’d be interested in having an automatic backup to my machine! Hopefully the job of convincing my machine to do that can be reduced to one or two button clicks, though? Say to hit “download”, then “execute”, then done? :-)

    • CommentRowNumber5.
    • CommentAuthorDmitri Pavlov
    • CommentTimeSep 15th 2018

    What happened to https://github.com/ncatlab/nlab-content? It used to be a very convenient way to make backups, but the last commit was on April 16, 2018.

  1. I’ve not looked into it yet, but my first guess would be simply that there is too much content for github.

    But in any case there is a big difference between nlab-content and the database, there is all kinds of crucial data in the latter. But I do intend to provide exactly the same functionality as for SQL dumps for downloading rendered content.

    • CommentRowNumber7.
    • CommentAuthorspitters
    • CommentTimeSep 15th 2018

    In earlier discussion the idea was to simply ask the github staff for more space.

    • CommentRowNumber8.
    • CommentAuthorRichard Williamson
    • CommentTimeSep 15th 2018
    • (edited Sep 15th 2018)

    Thanks Bas! My feeling, though, is that github is not really the appropriate place for this (they discourage SQL dumps, for instance, and this is not much different). If we host an endpoint on our own server for downloading the rendered content, we achieve more or less the same when it comes to quick recovery of current content, as long as there are people downloading regularly, of course. In other words, storing the database dumps allows us to recover everything, whilst storing rendered content complements this by allowing us to recover more quickly to something usable.

    • CommentRowNumber9.
    • CommentAuthorMike Shulman
    • CommentTimeSep 17th 2018

    I’d be happy to do this. I’d want an automated script that does it nightly. But I can set up cron on my own, and write my own shell scripts, so the easiest interface for me would be something like a single URL that my cron job could hit with a wget. But if you want to write a simple script that does something more complicated (maybe it would be best to keep multiple backups around at once, in case stuff gets corrupted and recent backups inherit the corruption), I could run that in my cron too.

    About how big would a single backup be?

    • CommentRowNumber10.
    • CommentAuthorRichard Williamson
    • CommentTimeSep 17th 2018
    • (edited Sep 17th 2018)

    I’d be happy to do this.

    Fantastic!

    The easiest interface for me would be something like a single URL that my cron job could hit with a wget.

    We more or less have this now, except that there are two URLs involved (one a sub-URL of the other), and one of them is a POST, not a GET. But if you are on a unix machine, you will have almost certainly have the command line tool curl, which is just as easy to run as wget, so you can achieve everything in a two-line script. I can provide you with the curl syntax. [Edit: actually you should be able to use two wget commands instead of two curl commands if you prefer!]

    I will create the credentials for you soon (this evening, maybe), and then you can just try it out.

    But if you want to write a simple script that does something more complicated (maybe it would be best to keep multiple backups around at once, in case stuff gets corrupted and recent backups inherit the corruption)

    For now I was thinking just of the simplest possible thing that simply generates and downloads the SQL dump, over-riding the old one. But you’re right that it would be a good idea in the end to keep a few hanging around (e.g. one that updates only per week/month) rather than just one. Just one which is over-ridden is much better than the current situation, though, and should protect us from many/most scenarios.

    About how big would a single backup be?

    At the moment it’s 2.3GB. If that size is problematic for people, we can look into a more complicated (but also more risk-prone) alternative (e.g. something which does a diff between a particular SQL dump and the current database, and makes only those changes to the previous SQL dump).

    • CommentRowNumber11.
    • CommentAuthorAlizter
    • CommentTimeSep 18th 2018
    Richard what about having a compressed dump? You could dump and compress weekly so many more people could download backups?
  2. I’ve finally created a user for Mike and given him instructions on how to do it. I am running a daily cron job to backup myself, I hope Mike will be able to do something similar. I am for the moment not overriding each day, I am managing it manually.

    I’ll see if the process goes smoothly for Mike, and then I can pass on instructions to Urs and anybody else who would be willing to do this.

    Re #11: Good ideas! But I’m a little short of time at the moment, so I’ll keep the current simple design for now. I am happy to create a user for anybody willing, though, so that they can do it; it is not difficult.

    • CommentRowNumber13.
    • CommentAuthorMike Shulman
    • CommentTimeJan 10th 2019

    Thanks Richard! The script you sent me works perfectly; I am setting up a cron job now. I also added a line to the script that gzips the dump once downloaded, which reduces the size down to ~600M. I think I will also set up a cron job to delete sufficiently old backups, since I don’t want to run out of disk space.

  3. Excellent! Thanks very much! Could you possibly paste your scripts here (minus authentication header)? I think others might like that behaviour too.

    • CommentRowNumber15.
    • CommentAuthorMike Shulman
    • CommentTimeJan 10th 2019

    The basic daily backup.sh script is a slight modification of the one you sent me:

    SQL_DUMP_ID=$(curl -X POST -H "Authorization: Basic ((AUTH))" https://ncatlab.org/sqldump)
    OUT_FILE=$HOME/nlab-backups/daily/$(date +"%Y-%m-%dT%H:%M:%S").sql
    curl -H "Authorization: Basic ((AUTH))" https://ncatlab.org/sqldump/$SQL_DUMP_ID > $OUT_FILE
    gzip $OUT_FILE
    

    where ((AUTH)) should be replaced with the personal authentication hash. The next script link-to.sh can be run weekly, monthly, etc. with a commandline parameter to hardlink the most recent daily backup to a weekly or monthly directory of longer-ago backups:

    FILE=$(ls -t $HOME/nlab-backups/daily | head -1)
    ln $HOME/nlab-backups/daily/$FILE $HOME/nlab-backups/$1
    

    Finally the daily delete-old.sh script deletes all but the most recent few of each backup:

    find $HOME/nlab-backups/daily/ -mtime +7 -exec rm {} \;
    find $HOME/nlab-backups/weekly/ -mtime +56 -exec rm {} \;
    find $HOME/nlab-backups/monthly/ -mtime +365 -exec rm {} \;
    

    The idea is that I want to keep some backups from up to a year ago, but those don’t need the finer-grained day-by-day nature of backups from the past week. Here’s my crontab:

    0 2 * * * /home/shulman/nlab-backups/backup.sh >/dev/null 2>&1
    20 2 * * 0 /home/shulman/nlab-backups/link-to.sh weekly >/dev/null 2>&1
    40 2 1 * * /home/shulman/nlab-backups/link-to.sh monthly >/dev/null 2>&1
    0 3 * * * /home/shulman/nlab-backups/delete-old.sh >/dev/null 2>&1
    

    I tested all this as much as I could, but there could still be bugs in it since not even one day has actually gone by. (-:

    All the scripts should begin with #!/usr/bin/bash of course, but the markdown parser here has trouble with that. (-:

    • CommentRowNumber16.
    • CommentAuthorRichard Williamson
    • CommentTimeJan 13th 2019
    • (edited Jan 13th 2019)

    Great, thanks very much! I can see from the logs that your cron job is downloading an SQL dump once a day. That’s really good. I’ll create a user for Urs as well when I get a chance.

    • CommentRowNumber17.
    • CommentAuthorRichard Williamson
    • CommentTimeJan 22nd 2019
    • (edited Jan 22nd 2019)

    Have created a user for Urs now, and sent instructions over email. I offered him your help as well as mine, Mike, with setting up a cron job :-).

    If anybody else would like to help out, just let me know. The more the merrier: the more people we have with cron jobs running at different times of day, the greater chance we have of losing as little data as possible in a ’crisis’.

Add your comments
  • Please log in or leave your comment as a "guest post". If commenting as a "guest", please include your name in the message as a courtesy. Note: only certain categories allow guest posts.
  • To produce a hyperlink to an nLab entry, simply put double square brackets around its name, e.g. [[category]]. To use (La)TeX mathematics in your post, make sure Markdown+Itex is selected below and put your mathematics between dollar signs as usual. Only a subset of the usual TeX math commands are accepted: see here for a list.

  • (Help)