## Not signed in

Want to take part in these discussions? Sign in if you have an account, or apply for one below

## Site Tag Cloud

Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.

• CommentRowNumber1.
• CommentAuthorTobyBartels
• CommentTimeSep 9th 2009
Look at fibration sequence (revision 9 if necessary). See the horrible unknown characters towards the bottom? Now look at revision 8. Hey, that looks great! (at least if you have the fonts). But here's the thing:

The code for those is exactly the same!

If you go back to when I first put that character in (revision 4), you'll see that same code. But here's the thing:

That's not the code that I put in then!

In fact, that sort of code has never worked; you need to use an SGML numerical character entity code instead. But somehow, the migration replaced that with the actual character, even though iTeX cannot parse that!

And why do the old revisions render properly, even using bad code? I have no idea!

• CommentRowNumber2.
• CommentAuthorAndrew Stacey
• CommentTimeSep 9th 2009

I don't think that this is a migration issue. It's also not quite a character encoding issue. It seems to be to do with iTeX and unicode characters. Sticking an unusual unicode character in to iTeX causes it to complain.

Why this is suddenly a problem is that the way Instiki parses entities has changed. It seems (from experimenting) that Instiki now translates SGML entities into unicode. I know that there have been some deep changes in Instiki recently and this may be due to one of them. The thing is that this happens on input. So when you put an entity into the code, it gets translated to a unicode character before it gets stored in the database. However, iTeX seems to have issues with certain unicode characters and changes them to 'unknown character'. That it is iTeX can be seen by the fact that $?$ produces 'unknown character' whilst ? is fine.

The fact that this happens only with new revisions is what makes me think that it is a new change to the input parser. As old revisions are not changed, it has to be something that happens before they get stored in the database, and it has to be something that changed since these old revisions were stored.

That's also why the old revisions work: they were stored in the database before the change was made and so didn't go through this particular input alteration. You as a user can only see the code for the latest revision but by going into the database then I can see that the old revisions still have the SGML entity code and it is only the newest revision that has the unicode character.

I'll mention this to Jacques and put a test page on the Lab Elves web for him to look at.

• CommentRowNumber3.
• CommentAuthorAndrew Stacey
• CommentTimeSep 9th 2009

Jacques says that this will be fixed in the next update.

• CommentRowNumber4.
• CommentAuthorAndrew Stacey
• CommentTimeSep 9th 2009

Update done. Please let me know which pages this still affects as I'll have to fix 'em in the database.

• CommentRowNumber5.
• CommentAuthorTobyBartels
• CommentTimeSep 10th 2009

I only know of fibration sequence and Sandbox offhand, although there certainly are other articles with funky characters in them given by SGML numerical character entities. If and when I find them, then I'll report them here.

• CommentRowNumber6.
• CommentAuthorAndrew Stacey
• CommentTimeSep 10th 2009

Unfortunately, it may be tricky to find. Any page with an entity that was edited in the critical period will have had it converted to the unicode equivalent. Outside iTeX we could just let this pass (or is there a problem with genuine unicode in pages?), but inside iTeX it may be an issue. It'll probably be easier for me to search in the database, if I can figure out how to code "look for all weird characters that were submitted in the last 24 hours".