Not signed in (Sign In)

Not signed in

Want to take part in these discussions? Sign in if you have an account, or apply for one below

  • Sign in using OpenID

Site Tag Cloud

2-category 2-category-theory abelian-categories adjoint algebra algebraic algebraic-geometry algebraic-topology analysis analytic-geometry arithmetic arithmetic-geometry book bundles calculus categorical categories category category-theory chern-weil-theory cohesion cohesive-homotopy-type-theory cohomology colimits combinatorics complex complex-geometry computable-mathematics computer-science constructive cosmology deformation-theory descent diagrams differential differential-cohomology differential-equations differential-geometry digraphs duality elliptic-cohomology enriched fibration foundation foundations functional-analysis functor gauge-theory gebra geometric-quantization geometry graph graphs gravity grothendieck group group-theory harmonic-analysis higher higher-algebra higher-category-theory higher-differential-geometry higher-geometry higher-lie-theory higher-topos-theory homological homological-algebra homotopy homotopy-theory homotopy-type-theory index-theory integration integration-theory internal-categories k-theory lie-theory limits linear linear-algebra locale localization logic mathematics measure measure-theory modal modal-logic model model-category-theory monad monads monoidal monoidal-category-theory morphism motives motivic-cohomology nlab noncommutative noncommutative-geometry number-theory of operads operator operator-algebra order-theory pages pasting philosophy physics pro-object probability probability-theory quantization quantum quantum-field quantum-field-theory quantum-mechanics quantum-physics quantum-theory question representation representation-theory riemannian-geometry scheme schemes set set-theory sheaf simplicial space spin-geometry stable-homotopy-theory stack string string-theory superalgebra supergeometry svg symplectic-geometry synthetic-differential-geometry terminology theory topology topos topos-theory tqft type type-theory universal variational-calculus

Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.

Welcome to nForum
If you want to take part in these discussions either sign in now (if you have an account), apply for one now (if you don't).
    • CommentRowNumber1.
    • CommentAuthorDmitri Pavlov
    • CommentTimeAug 31st 2013
    When searching for Γ-space on Google, the very first link turns out to be a page at the nLab.
    It looks like this:

    Gamma-space - nLab
    ncatlab.org/nlab/new/Gamma-space
    A description for this result is not available because of this site's robots.txt – learn more.

    An HTTP request to the nLab server reveals the source of the problem:
    the link /nlab/new/Gamma-space redirects to /nlab/show/Gamma-space.
    However, instead of using the HTTP 301 code (Moved Permanently)
    it uses the HTTP 302 code (Moved Temporarily).
    The practical effect of this is that Google thinks
    that this redirect will change soon and does not show the /nlab/show/Gamma-space
    link with its proper description (instead of the useless statement about robots.txt).

    Can we change the nLab engine to serve such redirects with the HTTP 301 code
    instead of the HTTP 302 code so that Google and other search engines can display the correct link and a proper description?
    • CommentRowNumber2.
    • CommentAuthorAndrew Stacey
    • CommentTimeAug 31st 2013

    The redirection of things like nlab/new/Gamma-space to nlab/show/Gamma-space was put in to counter the problem that new links to existing pages were appearing in Google searches so this is not the source of the problem. I’ve not figured out where these search results are coming from, but my hypothesis is that they exist on pages that Google indexes and so it lists them. The robots.txt blocks Google from following them and so actually Google never knows that new/existing+page redirects to show/existing+page so I don’t think that the status code has anything to do with it.

    It’s quite possible that I’m wrong; I’ve not done extensive analysis. But before I start changing the code I’d like to know how to detect whether or not the redirection code is making a difference as I prefer to keep the nLab’s code as close to the main instiki line as possible.

    See http://nforum.mathforge.org/discussion/4884/odd-result-in-google for the original discussion on this.

    • CommentRowNumber3.
    • CommentAuthorMike Shulman
    • CommentTimeSep 1st 2013

    Why is this not a problem experienced by other wikis, like Wikipedia, that also contain links to not-yet-existent pages?

    • CommentRowNumber4.
    • CommentAuthorZhen Lin
    • CommentTimeSep 1st 2013

    Redlinks on Wikipedia are of the form .../w/index.php?title=...&action=edit&redlink=1, which looks much less like a legitimate URL compared to .../new/.... But I doubt that’s the whole story.

    • CommentRowNumber5.
    • CommentAuthorDmitri Pavlov
    • CommentTimeSep 2nd 2013
    • (edited Sep 3rd 2013)

    Putting something in the robots.txt file does not prevent it from showing up in Google search. On the other hand, a 301 redirect will definitely remove /new/* pages from Google search, provided that they are not blocked by robots.txt in the first place.

    So it seems like a better solution here would be to remove /new/* pages from robots.txt and make them into 301 redirects instead of 302.

    More information on Google and 301 redirects: https://support.google.com/webmasters/answer/93633

    • CommentRowNumber6.
    • CommentAuthorAndrew Stacey
    • CommentTimeSep 2nd 2013

    Given that these pages were turning up before the redirection code was put in place, I’d still like to see more evidence that this change would fix things before making the change.

    The show/* pages are not in robots.txt. The /new/ pages are.

    The really odd thing here is that the page /show/Gamma-space is not showing up at all in a search. Even if I explicitly search for ncatlab.org/nlab/show/Gamma-space or Gamma-space site:ncatlab.org then it doesn’t appear. This seems to be more than a redirect would cause. Again, this problem predates the redirection.

    • CommentRowNumber7.
    • CommentAuthorDmitri Pavlov
    • CommentTimeSep 3rd 2013
    • (edited Sep 3rd 2013)

    Sorry, I meant /new/* pages instead of /show/* pages.

    For example, the page http://www.hochmanconsultants.com/articles/301-versus-302.shtml says “If a 302 is used instead of a 301, search engines might continue to index the old URL, and disregard the new one as a duplicate.”

    This seems to be the case here, Google indexes /new/Gamma-space, which 302 redirects to /show/Gamma-space, and the latter is ignored because of the 302 redirect as opposed to a 301 redirect, in which case the former would be ignored.

    • CommentRowNumber8.
    • CommentAuthorAndrew Stacey
    • CommentTimeSep 3rd 2013

    That still doesn’t explain why we were seeing this behaviour before the redirects were put in place, since then there was no link between the /new/ and the /show/ pages so the /show/ pages should have shown up in the search. But they didn’t. There are no links that I can find to /new/Gamma-space and there are plenty of links to /show/Gamma-space. I can understand that when Google follows /new/Gamma-space and gets to /show/Gamma-space then it doesn’t index it, but it ought to index it if it gets there directly, shouldn’t it?

    As far as search engines are concerned, the best thing to do with /new/ pages that do actually exist is simply to send a 404. However, that’s not helpful to people, which is why we chose the redirection method.

    The real difficulty here is that it is very hard to find out what google, and other search engines, actually do. That article you linked to is full of supposition. So it’s very hard to figure out how to judge if changes we make actually fix things in the google search list. What is easier to do is to ensure that if google indexes the wrong thing then someone ends up on the right page when they get to the nLab.

    Ideally we’d like to do both.

    Nonetheless, I’ve made the change in the code and we can monitor what happens.

    It would be useful to know when google indexed that /new/ address and where it got it from. I don’t know how to get that information, though.

    • CommentRowNumber9.
    • CommentAuthorDmitri Pavlov
    • CommentTimeSep 3rd 2013
    • (edited Sep 3rd 2013)

    but it ought to index it if it gets there directly, shouldn’t it?

    I am not sure about this. Google actively tries to remove duplicates from its search results, redirects are considered duplicates, so it seems plausible that it simply does not index /show/Gamma-space.

    I think it’s best to wait a month and see if Google’s results change.

    I also think it makes sense to remove /nlab/new from robots.txt, at least for the duration of the experiment, because Google might be confused by it. In fact, I think Google won’t even see the 301 redirect unless it is allowed to access /new/Gamma-space, which is only possible if /nlab/new is removed from robots.txt.

    • CommentRowNumber10.
    • CommentAuthorAndrew Stacey
    • CommentTimeSep 3rd 2013

    I am somewhat wary of removing the /new/ from robots.txt simply because the vast majority of /new/ pages do not exist and for those then we really would rather that the bots didn’t follow them.

    But your last remark confuses me a little: if it didn’t see the 301, surely it wouldn’t see the 302. The problem, as you describe it, is that the 301 is stopping it from removing the /new/Gamma-space from its index. I don’t care if it doesn’t see that /new/Gamma-space redirects to /show/Gamma-space because no-one should be going to /new/Gamma-space. What bothers me is that /new/Gamma-space is being indexed and /show/Gamma-space is not. So if the robots.txt is stopping Google from following /new/Gamma-space and figuring out where it goes to, that’s not a problem so long as it is picking up /show/Gamma-space from somewhere. What you’ve suggested is that the 301 means that when it reaches /show/Gamma-space then it figures “I already know this as /new/Gamma-space so I’ll not bother with it.” and that the 302 means that it now thinks “Ah, even though this is linked to /new/Gamma-space then this is the one I should have.”. As /show/ isn’t blocked by robots.txt then it will do so. It would be a pretty poor design if the robot block on /new/ also applied to /show/!

    So I think I’ll leave robots.txt alone for now. We can revisit it later.

    • CommentRowNumber11.
    • CommentAuthorTobyBartels
    • CommentTimeSep 4th 2013

    FWIW, 301 makes more sense than 302 to me anyway. We don't expect that page to go anywhere.

    • CommentRowNumber12.
    • CommentAuthorAndrew Stacey
    • CommentTimeSep 4th 2013

    I think I got 301 and 302 confused in my previous post. Hopefully it is understandable.

    Just to prove that I didn’t get confused when doing the implementation:

    HTTP/1.1 301 Moved Permanently
    Date: Wed, 04 Sep 2013 06:33:08 GMT
    Server: Apache/2.2.16 (Ubuntu)
    X-Powered-By: Phusion Passenger (mod_rails/mod_rack) 3.0.7
    Cache-Control: no-cache
    X-Runtime: 7
    Location: http://ncatlab.org/nlab/show/Gamma-space
    Content-Length: 106
    Status: 301
    Vary: Accept-Encoding
    Content-Type: text/html; charset=utf-8
    
    <html><body>You are being <a href="http://ncatlab.org/nlab/show/Gamma-space">redirected</a>.</body></html>
    
    • CommentRowNumber13.
    • CommentAuthorDmitri Pavlov
    • CommentTimeDec 14th 2013

    Apparently 301 redirects didn’t help: searching for Gamma-space on Google still gives the same result. It seems that the underlying cause of this is that robots.txt has a Disallow: /nlab/new line.

    When Google indexes http://ncatlab.org/nlab/new/Gamma-space, it appears that it sees the 301 redirect to http://ncatlab.org/nlab/show/Gamma-space, and, given the fact that /nlab/new/ is disallowed by robots.txt, it automatically infers that the corresponding /nlab/show/ page should not be indexed either.

    From the page https://www.google.com/webhp#filter=0&q=%22/nlab/new/%22+site:ncatlab.org it seems like whenever an /nlab/new/ page is linked, Google simply will not index the corresponding /nlab/show/ page. In other words, it seems like Andrew Stacey’s remark “It would be a pretty poor design if the robot block on /new/ also applied to /show/!” is actually true.

    Perhaps removing the Disallow: /nlab/new line from robots.txt will resolve the problem? Of course, bots would then index it, but since these pages have no content, they will rank very low in search results.