# Start a new discussion

## Not signed in

Want to take part in these discussions? Sign in if you have an account, or apply for one below

## Site Tag Cloud

Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.

• CommentRowNumber1.
• CommentAuthorMike Shulman
• CommentTimeAug 28th 2018

Let’s discuss this in its own thread.

• CommentRowNumber2.
• CommentAuthorMike Shulman
• CommentTimeAug 28th 2018

I think my own opinion is that we should delete spam, but short of actual spam (even including e.g. crank self-promotion, as well as simple mistakes and so on) I would rather just blank the page and possibly rename it (or make it into something real with the same name, as we just did with Curry’s paradox), and not re-use such blank pages either because it results in confused edit histories.

• CommentRowNumber3.
• CommentAuthorRichard Williamson
• CommentTimeAug 28th 2018
• (edited Aug 28th 2018)

The main thing for me is what I wrote in #4 here, namely to ask that pages are created in the intended manner, not by editing an existing page. Amongst other things, the semantics are wrong, preventing being able to programmatically identify it as a page creation, for example.

Though it is of lesser importance, for me, the leaving of empty pages in the wiki seems hack-ish and not good from a user perspective, and I would prefer that we not do it. One might argue that they should not be actually deleted, just marked as not to be shown, but that would require a bit of work to implement, and for now I think it is working OK to delete them manually.

In general, immutability is very good, and I definitely agree that we should be very careful when deleting, that only a very few people should have the power to do it, and that that power should be exercised with a great deal of caution. But I would suggest that we mitigate against mistakes, e.g. by making an API which makes an SQL dump before deletion, rather than not deleting at all. I think that many of us are keen to dust and polish the nLab, which is or nearly is 10 years old now, and I think that this is part of that. (I am actually thinking of suggesting that at some point we slowly go page-by-page through the nLab, cleaning things up, but I think we should wait with that for now, while we are making a lot of changes).

As a last point, one of the things that I like about the nLab is that there is very little prescription, rule-making, and hair-splitting. There was maybe a little more of a tendency to that in the early days (I remember being a bit put off by it at the time, when I was an observer but not active), but things have evolved so that things are typically handled amicably and quickly on a case-by-case basis. I would definitely like to retain that! Thus I’m all for discussing things and for informal guidelines, but would prefer not to take it further than that.

• CommentRowNumber4.
• CommentAuthorTim_Porter
• CommentTimeAug 28th 2018
• (edited Aug 28th 2018)

I recall (I hope correctly) that on at least one occasion we had a bad spam attack on personal webs. I think that not everyone checked their personal webs and some of the inactive ones may have some spam in them. This relates, therefore, both to the deletion of inactive personal webs and here to ’emptifying’ spam pages.

1. Yes, I think this is the reason that many of the personal webs are marked as ’published’, as currently discussed here. I definitely agree that we have to have a good way to deal with spam. Handling spam or erroneously created pages, etc, is not my favourite aspect of being responsible for the software of the nLab, so I am all for discussing this to improve how we handle it!

• CommentRowNumber6.
• CommentAuthorMike Shulman
• CommentTimeAug 28th 2018

I don’t see any disadvantage to keeping non-spam blank pages around: it avoids losing information, and is less work. Surely disk space is not an issue.

• CommentRowNumber7.
• CommentAuthorTobyBartels
• CommentTimeAug 29th 2018

Reusing the blank pages is not important to me. Some people wanted us to do that to prevent a build-up of blank pages, but it never really caught on (I don't always remember to do it), and that being the case, it seems that it was probably never necessary. Since it causes problems, I'll stop doing it.

I really don't want to start deleting things to make the Lab look polished. This desire is part of what led Wikipedia to clamp down on new users about a decade ago, with the result that this graph obtained an inflection point that has never been reversed. (Actually, you probably get a more meaningful graph if you ask Wolfram|Alpha for it and then click the button for a log scale. There's no longer an inflection point, but instead there's a distinct change of slope in a nearly piecewise-linear graph, and it's at a date that fits better with my memory too.) An active wiki is always unfinished, and so it should appear.

Putting blanked pages into category:empty allows us to exclude them when doing anything like counting pages or running other statistics. Google and the like have been instructed by robots.txt to ignore their old versions. And the ordinary user never sees them without going to look for them. The only thing that really makes me unhappy is that they can't be found under their original names; I would really like to have a way to find all pages ever published under a given name and to keep track of all names under which a given page was ever published. With time stamps. I realize that this would be a lot of work, however; but we should still do whatever we can that doesn't involve such work.

To me, it's important to be able to find what I want to find within the wiki itself, without using any superpowers. Having the information offsite in a data dump is not enough; it means that I have to bother Richard if I want to look at something on a whim, and it means that casual users of the site have no idea who to ask or if there is anything at all to be done.

Have we been deleting spam pages? How many of these do we get? If it's a lot, then we probably really do need to delete them, but I wouldn't want to delete anything more than that. If it's a rare event, then I'd like to delete nothing. (Our main line of defence against spam, as I understand it, has been a filter, catching spam before it's ever posted. The filter has occasionally been overzealous, but at least if it does something wrong, then you know about it right away.)

• CommentRowNumber8.
• CommentAuthorTobyBartels
• CommentTimeAug 29th 2018

It appears from this comment that we're not only deleting spam but even deleting things that aren't clearly spam! This makes me very sad.

The page reported in that comment apparently had no real content; but in the next comment we're deleting someone's attempt to write out a proof!? It's not clear from the description there what was supposed to have been proved, maybe that a pseudometric space is Hausdorff, which is pretty basic. But we do write out proofs of such things from time to time; Urs has been making pages dedicated to proofs of basic facts, although this doesn't fit his naming scheme.

Whether there was anything worthwhile there or not, what really upsets me is that I can't just go and look. If it had been blanked and hidden away in category:empty, then this would be equivalent to deletion as far as the ordinary user goes. But because it was deleted, I have to ask Richard to dig up his SQL dump to tell me what was going on!

• CommentRowNumber9.
• CommentAuthorUrs
• CommentTimeAug 29th 2018

Let’s tone it down a little. Richard has been offering a tremendous amount of his help, time and expertise. In this case he has been making a pretty self-evident cleanup. I can understand the point of the hyper-careful approach that you are promoting, but there is room between finding that reasonable and becoming “very sad” if it is not met.

• CommentRowNumber10.
• CommentAuthorRichard Williamson
• CommentTimeAug 29th 2018
• (edited Aug 29th 2018)

Reusing the blank pages is not important to me… Since it causes problems, I’ll stop doing it.

Thanks very much for understanding!

Regarding empty pages and spam/inappropriate content, my personal preference remains the same that I myself like these kind of things up to be cleaned up, but I will of course follow the majority preference.

deleting things that aren’t clearly spam

I maybe did not phrase things optimally in the other thread where I described what was done. What I meant was that the page might have been created accidentally, or through some misunderstanding; I would of course not remove anything where I would regard there as being any question that the content was appropriate. I do keep a record of what I remove, mostly in order to keep track of the IP addresses in case they need blocking. For the record, the page removed that you are referring to had the following as the title, and the content was an unformatted (copy and pasted, I expect, or maybe done by a machine) proof of this. The author was ’joeljoeljoel’.

'(X,d) is said to be a pseudometric space if we change the first assertion of a metric by: d(x, x) = 0. (It means that one may have d(x, y) = 0 for distinct values x ̸= y). Say why a pseudometric space will not be Hausdorf',


I would really like to have a way to find all pages ever published under a given name and to keep track of all names under which a given page was ever published.

I completely agree with this. It is on the Technical TODO list (nlabmeta), item 3) currently.

• CommentRowNumber11.
• CommentAuthorMike Shulman
• CommentTimeAug 29th 2018

Handling spam or erroneously created pages, etc, is not my favourite aspect of being responsible for the software of the nLab, so I am all for discussing this to improve how we handle it!

Here’s a proposal. I don’t know how technically hard or feasible it is, but just brainstorming.

At the bottom of every page (on every web, even) we could have a link saying “Mark this page as junk or spam”. When it is clicked, we ask the user (as a check against accidental clicks) “Are you sure? This entire page will be moved to the trash and disappear.” If they confirm, the page is moved (along with its edit history) to a separate web called “trash”, possibly renamed to avoid clashes in pagenames from different webs, and possibly edited to replace its current text by something like “This page was deleted from WEB on DATE by IP” (so that someone browsing the trash web has to click “history” to view the actual deleted version). We could also mark the entire trash web as non-indexable in robots.txt.

It would be nice to have an easy “undelete” option too, but I’m slightly wary of just having an “undelete” button on the trash web accessible to everyone, since it might result in delete-undelete wars with spammers. However, we could try it and see. At least there could be an “undelete” script that is easy for an administrator to run.

• CommentRowNumber12.
• CommentAuthorAlexisHazell
• CommentTimeAug 30th 2018

If they confirm, the page is moved (along with its edit history) to a separate web called “trash”

Yes, this approach is used by Foswiki.

• CommentRowNumber13.
• CommentAuthorTobyBartels
• CommentTimeSep 1st 2018

Thanks for the explanation, Richard. I'm also delighted to see that item on the Todo list. Mike, I like your idea, and I'd like to hear what Richard thinks about it.

I can remain happy despite any number of disagreements about what to do, even errors or (if not in excessive amounts) outright vandalism, as long as the data is easy to track down.

2. Mike, I like your idea, and I’d like to hear what Richard thinks about it.

It sounds fine to me. I think I would prefer to have the ’trash’ web be clearly distinguished from the others in some way, and for it to be clear that it is not intended to be stumbled upon by accident. Also it would maybe be more of an archive than a web as such. But roughly speaking I have no objection to this approach.

• CommentRowNumber15.
• CommentAuthorMike Shulman
• CommentTimeSep 6th 2018

Yes, it would be good to have the “trash” web be clearly marked. I’ll go ahead and add this to the todo list.

• CommentRowNumber16.
• CommentAuthorAlexisHazell
• CommentTimeNov 14th 2019

So in the absence of a ’Trash’ web, should spam pages be deleted from the backing database (which I can do), or are we renaming them (which I can also do), or … ?

• CommentRowNumber17.
• CommentAuthorUrs
• CommentTimeNov 14th 2019

If you can, please do delete spam pages. Thanks!

(We should be very restraint with deleting for good anything that anyone could consider a genuine contribution. But clear-cut malicious spam should not be kept anywhere in our database.)