Not signed in (Sign In)

Not signed in

Want to take part in these discussions? Sign in if you have an account, or apply for one below

  • Sign in using OpenID

Site Tag Cloud

2-categories 2-category 2-category-theory abelian-categories adjoint algebra algebraic algebraic-geometry algebraic-topology analysis analytic-geometry arithmetic arithmetic-geometry bundles calculus categories category category-theory chern-weil-theory cohesion cohesive-homotopy-theory cohesive-homotopy-type-theory cohomology colimits combinatorics complex-geometry computable-mathematics computer-science connection constructive constructive-mathematics cosmology definitions deformation-theory descent diagrams differential differential-cohomology differential-equations differential-geometry differential-topology digraphs duality elliptic-cohomology enriched fibration finite foundations functional-analysis functor galois-theory gauge-theory gebra geometric-quantization geometry goodwillie-calculus graph graphs gravity grothendieck group-theory harmonic-analysis higher higher-algebra higher-category-theory higher-differential-geometry higher-geometry higher-lie-theory higher-topos-theory history homological homological-algebra homotopy homotopy-theory homotopy-type-theory index-theory integration integration-theory internal-categories k-theory lie-theory limit limits linear linear-algebra locale localization logic mathematics measure-theory modal-logic model model-category-theory monoidal monoidal-category-theory morphism motives motivic-cohomology multicategories noncommutative noncommutative-geometry number-theory of operads operator operator-algebra order-theory pasting philosophy physics pro-object probability probability-theory quantization quantum quantum-field quantum-field-theory quantum-mechanics quantum-physics quantum-theory question representation representation-theory riemannian-geometry scheme schemes set set-theory sheaf simplicial space spin-geometry stable-homotopy-theory stack string-theory superalgebra supergeometry svg symplectic-geometry synthetic-differential-geometry terminology theory topological topology topos topos-theory type type-theory universal variational-calculus

Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.

Welcome to nForum
If you want to take part in these discussions either sign in now (if you have an account), apply for one now (if you don't).
    • CommentRowNumber1.
    • CommentAuthorAndrew Stacey
    • CommentTimeJul 12th 2009

    There's a discussion on latest changes about making an html copy of the n-lab downloadable every now and then. Jacques has disabled the official export html facility for reasons of severe server overload.

    I'll need to experiment, but I think that what Zoran wants could be achieved with a simple wget command which would also have the advantage of making sure that all the links are correct (and wouldn't require any changes to Instiki). I presume that, as Zoran wants to be able to do stuff offline, this would be desirable. What else would be useful for this?

    • CommentRowNumber2.
    • CommentAuthorAndrew Stacey
    • CommentTimeJul 13th 2009
    • (edited Jul 13th 2009)

    Now that I've read the wget manual, then this does seem eminently possible. The exact command depends a little on exactly what should be downloaded. Presumably what one wants is all the existent pages in their most recent forms. Thus one doesn't care about all the fiddly bits concerning histories and edits boxes. What one wants is everything of the form:

    http://ncatlab.org/show/some+random+page
    

    plus everything required to properly display that page (icons, stylesheets, etc). To play nice with the server, this should only be downloaded if the server copy is newer than the original.

    First step is to get a list of all the pages, we do this by downloading the All pages page and extracting a list of the pages (via a perl script). We feed this back into wget as a list of pages to get (using the -i option). For each downloaded page we ensure that we have the required extras to display it correctly (-p option), we convert the links so that they work correctly: links to downloaded files point to downloaded files, links to non-downloaded files point to non-downloaded files (-k option), we use time-stamping to only get new pages (-N), but because we're doing a little post-processing we need to keep the original files for time-stamping to work correctly (-K). Files are also converted to html extension (-E) since no matter how they were generated, they are now boring html (well, okay, xhtml+mathml+svg) files.

    wget --output-document=- http://ncatlab.org/nlab/list \
    | perl -lne '/<div id="allPages"/ and $print = 1;
                    /<div id="wantedPages"/ and exit;
                    /href="([^"]*)"/ and $print and print "http://ncatlab.org$1";' \
    | wget -i - -kKEpN
    

    I haven't tried this yet. I would suggest that, at least the first time anyone does this, it be done at light usage times. Not completely sure when they are, though.

    • CommentRowNumber3.
    • CommentAuthorTobyBartels
    • CommentTimeJul 13th 2009

    First step is to get a list of all the pages, we do this by downloading the All pages page and extracting a list of the pages (via a perl script).

    Or if you want the Markdown source as well, you can download the Markdown export and get the titles from that. I'm inclined to guess that this would be cleaner. But hey, if you wrote the script already …

    light usage times. Not completely sure when they are

    In my experience, weekends (dang, just missed one!) and between 7:00 and 9:00 UTC.

    • CommentRowNumber4.
    • CommentAuthorzskoda
    • CommentTimeJul 14th 2009
    The script does work properly. I did one successful run from old RedHat9 station. Thank you very much for thinking through about the problem! I should warm up myself and start writing similar code myself when needed (I used to be a good programmer around 1998, with focus on languages and compiler design, but practically stopped doing it at all soon after that). From 14.9.2001 (the date when I flied to a math conference where I gave my first conference talk, and in the months just after that I was finishing last chapters of my thesis and so on never stopped), I have had almost no time for anything but math/physics career and my computer skills degenerated to about 10% of what they were plus the technology went on developing without me following it...Maybe it all comes back once...

    Zoran
    • CommentRowNumber5.
    • CommentAuthorAndrew Stacey
    • CommentTimeJul 16th 2009
    • (edited Jul 16th 2009)

    I'm amazed that a script did want it was intended to do first time! Mind you, the real test is when you run it the second time and see if it genuinely only downloads the pages that have been updated.

    I can empathise with the hacking/mathing dilemma. The similarity in both is so strong it's sometimes hard to allocate time appropriately. The old "why study" springs to mind:

    I do maths, therefore I hack
    The more I hack, the less time I have
    The less time I have, the less I do maths
    So why do maths?

    Okay, not perfect. But if I add polishing poetry into my day I'd never get anything done.

    • CommentRowNumber6.
    • CommentAuthorzskoda
    • CommentTimeJul 28th 2009
    This is totally opposite to the Littlewood's point of view that really creative mathematics can be done only if it is not done long hours. He says at most 5 hours of creative thinking a day, if more then the things degenerate.

    Anyway, I can not find basic facts on the forum about what kind of system you have: what is your underlying server -- is it default WEBrick or Apache (ruby on rails work with both) ? Which principal other applications are served ?
    • CommentRowNumber7.
    • CommentAuthorAndrew Stacey
    • CommentTimeJul 29th 2009

    Since you are talking about WEBrick and ruby on rails then I presume that you are talking about the Instiki process underlying the n-lab, rather than what's running this forum.

    At the moment, Instiki uses mongrel as it's server and is proxied through a lighttpd server. The other relevant fact is that it uses sqlite3 for its database. There's been no discussion about shifting from lighttpd to apache, but when we migrate we will shift from sqlite3 to mysql. There are no other applications served on the n-lab than the instiki process underlying the n-lab (and the other labs, but it's the same instiki process).

    • CommentRowNumber8.
    • CommentAuthorzskoda
    • CommentTimeAug 3rd 2009
    Thank you for the answers. I am a bit surprised with the answer content a bit, e.g. as the mysql is the default choice for almost all ruby on rails manuals I found on the web.
    • CommentRowNumber9.
    • CommentAuthorzskoda
    • CommentTimeAug 7th 2009
    Today, to have a new version before going partly offline (partial vacation) I run today the script from above from another linux machine, it took wget about 40 minutes to execute with all the pauses included. Here is the outcome, tarred and gzipped, with only a bit over 10 Mb while the unpacked .tar version is I think about 77 Mb

    http://www.irb.hr/korisnici/zskoda/nlab.tar.gz
    • CommentRowNumber10.
    • CommentAuthorTobyBartels
    • CommentTimeAug 7th 2009

    Thanks, it is very helpful when one person does the 40 min wget and then makes a nice compressed file for everybody else to download. (^_^)

    • CommentRowNumber11.
    • CommentAuthorAndrew Stacey
    • CommentTimeAug 10th 2009

    Zoran, Instiki uses sqlite3 as default because it can be done entirely internally. With mysql you need to set up two things: a mysql database and a rails app. With sqlite3 everything is in the rails app. So it makes it easier to do and for small installations there's not enough of a difference to make mysql the default. However, as we're finding out then there's a reason that mysql is preferred to sqlite3! We do intend to migrate soon.

    Thanks also for putting the download online. I like the idea of distributing the n-lab a little, not only to save bandwidth which we have to pay for but also to make it feel more like a distributed project (which it is). Somewhere (can't remember where off the top of my head) I've pondered about how to set up static mirrors of the n-lab ...