Want to take part in these discussions? Sign in if you have an account, or apply for one below
Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.
Cf also database of categories.
Anybody interested in a bit more flexible database (I think handling big tables in the instiki would get really cumbersome)? With some software support it would be easier to handle bunches of categories. I created a prototype (ugly design, without any links, grouping of properties&categories would be nice).
From the database page: “I remember someone at nLab could write the database in Ruby, is this offer still on?” So there were plans to do something like this?
PS: Okay, I have found this, well, there has not been a real result (regarding semantic wiki: that would probably be the best, but would require some complex software).
I know such a database: it’s called the Lab! When I need to know what the regular epimorphisms in Category Abc are are I go to Google and type
nLab "regular epimorphism" Abc
That gives me all that the Lab database already has on this. Sometimes I get the information that I need this way. Sometimes not. If not, I collect it elsewhere or work it out myself or something, and then add it to the Lab, so that I have it next time that I need it.
Hehe, yes, I already noticed that: Some consider structured data useless, some don’t. Well, personally I will slowly put some stuff in there, because I would like to get some overview, but of course, for most cases the nLab is much more appropriate (thanks for your awesome work, btw ;)). But if anybody is interested, let me know.
The Lab is probably “unstructured data” in the technical sense, but is it unstructured in the sense that you would acually need in practice? My impression is the opposite:
if in practice you need to know about which properties are shared by which category, a rigidly structured database that works like a telephone book will not be so useful. The most useful thing will be a search engine that to the keywords which you believe are related to what you are looking for spits out all the information that looks relevant. This is precisely what you get with Lab+Google.
(I had said something similar already last time that somebody started to built a “database of categories” here: to be really useful it needs to be very flexible.)
The main disadvantage of the Lab as a database currently is that – despite all the effort – it is still lacking so much information. But this is a problem that will be much much worse if you try to single-handedly build a database from scratch. If you really need something with lots of pulldown-menus etc., I’d suggest that the way to go would be to program it as an overlay for the Lab, something that harvests its information by doing keyword searches on the Lab.
Well, I do not claim to build anything “complete”, and yes, RDF would be more flexible. “harvesting information by doing keyword searches”? How do you think should that be possible, nLab mainly consists of running text, it would require advanced computational understanding of the English language, which is far beyond the state of the art.
“harvesting information by doing keyword searches”? How do you think should that be possible, nLab mainly consists of running text,
I could ask back: how would a database of categories be useful that does not have this running text, by and large? I don’t see how there can be a useful dataset of categories that works just like a phone book.
The matter in question is just too complex for that. After all, a “database of categories” is pretty much a “database of mathematics”. Everything in math lives in one category or other.
What I can imagine as being useful is an overlay to the nLab, which maybe looks like a database to the user, if that’s desireable, that offers useful menus and checkboxes, then has some ideas about how to do gratifying searches for what seems to be the information requested, and then produces pointers to the subsections of Lab entries that have this information – maybe displayed in subwindows as GoogleBooks does it. Something like this. I am pretty sure that this would be useful, especially as the Lab grows.
I totally agree with you that some datasets for itself, without references to real mathematical explanations, are worthless, but I think some way to access information systematically, too, would be useful.
“I am pretty sure that this would be useful, especially as the nLab grows.”
Yes, I think so, too. But somewhere the information has to be stored, automatic extraction is really, really unrealistic (there is a project extracting information from Wikipedia infoboxes, even that is challenging). What do you think, how it should be stored?
But somewhere the information has to be stored, automatic extraction is really, really unrealistic (there is a project extracting information from Wikipedia infoboxes, even that is challenging). What do you think, how it should be stored?
We have on the Lab the wiki-category called “category: category”. This labels those pages that discuss a specific category.
One thing one could do is start adding to these pages a standardized template of Properties-subsections, such as
## Definition
{#Definition}
Definition. The category $C$ has as objects ... and as morphisms ...
## Properties
### Monos and epis
{#MonosAndEpis}
Theorem. A monomorphism in $C$ is a morphism such that... An epimorphism is a morphism such that...
Proof. ...
### Limits and colimits
{#LimitsAndColimits}
Theorem. Limits in $C$ are computed by...
Proof. ...
And so forth. This would provide information in a way that is all of: robust, human readable, equipped with relevant background information, indexable by machines.
To some extent we have been doing this anyway. But we never agreed on a fully standardized formatting/labelling. We could just do that. Then you or others could – I suppose – easily write software that searches the Lab for these standardized section headers / section labels.
The big advantage that I see is that we don’t duplicate effort too much, with some people storing information in a wiki, others storing the same kind of information in some other database. There are not that many people who do that anyway. We should make sure that they benefit from each other’s efforts, and not duplicate them.
What do you think?
{#something} is an anchor?
Yes. It seems to serve as an identification just as well as
<!--#something-->[…]<!--#/something-->.
I wouldn’t mind at all. You can try the Sandbox, but I suppose for most purposes you could just as well work in the relevant entry itself.
Thanks for looking into this! I believe if we can settle on a good general scheme for “indexing” material in Lab entries in a standardized way, that would eventually be quite useful.
You are welcome to use the Sandbox or to create Sandbox/Set to avoid interruptions from other people. Of course you can also use Set itself when you have something that you like.
It’s not exactly integrated with the text, but at least it gives people a place to put the data that your extractor can use.
The User, thanks for taking the initiative on this! Do you have a name? Around here we usually use our real names… and “The User” is a more than usually uninformative handle. (-:
I think metadata is an excellent choice for how to store this sort of information. As you mentioned above, I have advocated before for a semantic-wiki style approach to this sort of thing, and this seems fairly similar. I actually prefer that exhaustive data of this form not necessarily be in the main text of the page; I prefer that pages don’t get bloated unnecessarily. A very long page with lots of lists of information is often less useful to me, as it is overwhelming and hard to find things in, than a shorter one which mentions only the most important facts with references to where other facts can be found.
Can I ask that we make the names of the properties be more descriptive? I realize that they have descriptions at Metadata properties but I think the names could stand to be more descriptive and memorable as well. Perhaps they could also adhere to the general nLab convention of lowercase names, and omitting unnecessary hyphens (eg. “coproducts” rather than “Co-Products”).
Also, is there any way that the table at Metadata properties could be made human-readable as-is, without needing to view the source?
I would probably lean towards using whole words like “extremal epimorphisms” (or “extremal-epimorphisms” or “extremal_epimorphisms” if spaces are a problem), or maybe the shorter “extremal epis”. Anyone else have an opinion?
1 to 21 of 21