Want to take part in these discussions? Sign in if you have an account, or apply for one below
Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.
There have been two empty pages created lately, both anonymous. They are at Riemann sphere and quasi inverse. It looks as if both were attempts to add something that was aborted.
I've moved them to category: empty. People should remember to use these up when creating new pages (although I myself am very bad at this).
I don’t understand. People should remember to use what up? Or perhaps I should ask: people should remember to do what when creating new pages?
The idea is that when about to create a new entry titled “foo” one should go to one of the existing entries in category: empty and rename it to “foo”.
I am not sure, though, if I understand why we should do that. Is it to save the server from handling a few more bytes? (Does it even have that effect?) To me the negative effect seems to be that the history of the new entry starts out weird for newcomers.
Imagine the (maybe unlikely, but possible) case that a page that used to be a spam page, then was moved to category: empty finally ends up being re-vamped to a genuine entry which has so much cool information that it is being cited from published publications. Then whenever researchers go to look at its history – as on the HowTo-page we say they should do in order to see which version to cite – they will see the old junk at the beginning of it. Seems awkward and unnecessarily awkward.
That's a good point, Urs. Maybe we should just leave these alone.
Except that then we end up with an ever increasing number of empty pages - we’re currently at 23. We decided on this policy of using empty pages because some didn’t like deleting anything.
It’s not to save bytes - deleting would be the right solution for that.
One possibility would be to move them to another web. It’s just a single field in the database, I think. So we could dump all the empty pages on either nlabmeta or doriath and then use them there when we need a new page since neither of those webs will produce citable material.
It’s not to save bytes
What is it for, then?
Simply to avoid having random pages around. There’s no direct harm. It’s just a bit messy.
I find it more messy to have pages that are not orphaned (receive links) but contain spam or other weirdness in their history than to have orphaned spam/empty pages. Eventually orphaned pages can/should be deleted anyway. We should remove spam/weirdness out of the system and not work hard to keep it around.
I agree that spam pages which have no other useful history should be deleted (or moved to invisible or harmless history-preserving alternative web, as Andrew suggested and what would historians of the effort like Toby prefer).
On the other hand, I disagree that the spam history is the main concern in the mess the spam pages create! First of all, Urs, no page has more (multiply appearing in the history) spam than the history of HomePage (and few closely related frequent pages) ! And this is far more cited and used page than all the orphaned pages together. The mess in existence of bad pages comes from the fact that they will appear, generate, transmit and print on the screen of every download of recently-revised page by humans and robots, slowing down our eyes, robots, adding more requests, and having harder time to clean up various lists of pages. They will need to be processed by any backup of a database, by any analysis of a robot, by any viewer of history (beware that some people have vision problems and still contribute to the Lab; scrolling through odd material worsens the problem), by any viewer of list of recent pages. It will also give a wrong impression on how many pages we actually have. Finally, once we employ a programmer to help us with some task, the estimate on the number of pages may affect the price of the task, regardless if the pages are useful or remnants of a spam.
Slowdown is a real issue. But what would add to the overall slowdown is the idea that in order to create a page I need to go to some collection of existing entries, find one, rename it, then fight the cache bug, etc. I already need to jump through plenty of hoops when working here, I don’t need another one.
Let’s just delete these pages and be done with it.
Yes, Urs, I agree, if dealing with the spam issue is the order of magnitude of the spammer’s effort (or even an order more!), that means that we value his/her time more than ours!
The existence of a few dozen empty pages will have a negligible effect on any price for programming the nLab, with its thousands of pages, and likewise has a negligible effect on speed. They will hardly ever be seen by eyeballs either, and they will have a negligible effect on any robots that see them. The only way that one would notice them without looking for them is in an alphabetical list, if one happened to be in the ‘e’ section. Otherwise, each of these will show up on Recently Revised just once, when first blanked. Doing anything else to them (other than making something productive out of them) either just adds another unimportant entry to Recently Revised and other records or logs, or should (and it's one of the major remaining flaws of Instiki that some changes, especially page renames, are not recorded there or even in the pages' own histories).
As for spam in the history of the Main Page, the existence of spam there, where it can hardly be helped, has nothing to do with the existence of spam in the history of previously empty pages. The latter doesn't bother me (and it would be even better if the page rename was in the history so that one could see immediately what happened); but if it bothers somebody else, then it will bother them whether or not there is spam elsewhere such as in the history of the Main Page.
It is really distasteful that people want to pretty up the history of the nLab, or of anything else for that matter, by deleting data. (It's another matter when it's a significant amount of data or when it takes effort to preserve the data; but here we're talking about a tiny amount that would take effort to hide.) The history was whatever it was, and there is no sense in whitewashing that.
It is really distasteful that people want to pretty up the history of the nLab, […] there is no sense in whitewashing that.
I think you go a bit overboard here. The history of the Lab also includes me interrupting editing an entry to go to the bathroom. We don’t record that, because, just as with spam, while it is part of the whole universe progressing in time, it is not useful to keep records of.
But I think if discussion of 23 empty pages now leads us in to calling each other distasteful, we better quit.
The existence of a few dozen empty pages
This is just the present state. It is likely that at least few times in future we will have massive attacks with hundreds of pages affected or created. I am surprised it still did not happen.
it’s one of the major remaining flaws of Instiki that some changes, especially page renames, are not recorded there or even in the pages’ own histories
I agree.
want to pretty up the history of the nLab
I think, on the contrary, that it is prettifying saying that we have (as the actual numbers say) 7000+ nLab pages while we have, among those, several hundred pages among those which are either sandboxes, empty pages, spam pages and “history” pages (I still think “obsolete” in the title would be better than “history” regarding that one would like to search for history in the sense of the entries on history of mathematics; it is annoying that such a search would give wrong entries). Such pages can be preserved in one of the aside Labs, or archiving backup once a year, leaving Lab free of trash and functional.
One of the reasons why I am not enthusiastic any more about organizing and contributing material to the announcements on the content of the Lab after so and so much landmark entries (remember 3000+ digest ?) is that I do not know how many content pages we actually have. Thus the “negligible” pages do make mess to our real motivations, time and hapiness.
I do not think that you (or any other person) are distasteful, Urs.
If we automatically recorded your going to the bathroom, I would not want to delete that. Except that if we automatically recorded everything like that, then it would probably be a significant portion of what we record, and so it would be worth deleting it after all. However, we do not already record that, so it's not worth trying to record it now. (Instiki's pattern of not recording intermediate edits over the course of half an hour is similar. I'm not really happy with that, but —in the absence of preview— it solves a problem.)
It is likely that at least few times in future we will have massive attacks with hundreds of pages affected or created.
If this happens, then it would probably be best to delete the created pages and to delete the affected entries in the history, if we have good tools to do this. That's because this would now be a significant part of the history, so it would be worth the effort to figure out how best to clear it up and worth setting the precedent of doing so.
I think, on the contrary, that it is prettifying saying that we have (as the actual numbers say) 7000+ nLab pages while we have, among those, several hundred pages among those which are either sandboxes, empty pages, spam pages and “history” pages
The empty pages aren't significant here. The meta pages (which include the Sandboxes) are, however. Since we don't want to delete those, it would be good to have a script to count pages that removes those (and the “history” pages) from the count; then we should remove the empty pages too. Another item to remove is contents and SVG pages intended for inclusion; the way that Instiki works, we can't put these in categories (like meta
and empty
), but they can be identified by their names.
I still think “obsolete” in the title would be better than “history” regarding that one would like to search for history in the sense of the entries on history of mathematics; it is annoying that such a search would give wrong entries
Yes, this is a good idea.
1 to 16 of 16