Want to take part in these discussions? Sign in if you have an account, or apply for one below
Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.
I have improved the security of tex diagram rendering and protection against javascript injection attacks in this commit, though the most important aspect of that, namely the setting of openin_any = p
in texmf.cnf, is not in code (for now: in future, with for example a dockerised infrastructure, we will be able to have it in code).
The security of the tex renderer should be good now. The protection against javascript injection is not as good, but is not terrible. I have a plan for improving it, but it is dependent on some more fundamental improvements to the renderer which I am planning first. (The issue with every HTML sanitiser that I have found, and why I am not simply adding one to the renderer already, is that they are too slow for our larger pages; this one is really good for example, but cannot cope with our larger pages.)
Some people enjoy trying to find security holes; I would very much welcome this if anybody wishes to try it, e.g. feel free to try to carry out some harmless javascript in the Sandbox. Just let me know beforehand if you plan to do anything which might have consequences.
With continued pressure being placed upon me to strengthen the protection against cross-site scripting attacks, and in the absence of a library that I am happy with for this/that is fast enough, I have taken the somewhat radical step of disallowing all user-entered XML tags except those in a very small whitelist. This basically excludes the user from entering any MathML, SVG, or HTML by hand in the edit pane.
This will mean that some pages which parsed before will now get an error the next time they are edited. This mainly concerns <img>
tags referring to files uploaded to the nLab. For this, there is an alternative in the Markdown/Instiki syntax using :pic
, which should work and should take only a few seconds to change. If it does not work, let me know. There are also one or two HTML tables around; unfortunately for now one will have to approximate with Markdown syntax as best one can. I will try to implement a more flexible alternative when I get the chance. Finally, there are some SVG pictures and pieces of MathML. These can be re-done in Tikz/LaTeX.
I expect there to be some teething problems with this, just let me know when you encounter them. But basically this is a simple way to forget about client-side security issues without affecting rendering speed, and I think is the best solution for us currently.
Thanks for being willing to keep working on this! I am curious what is on the whitelist, and why it is more trustworthy than table
? I guess I can see that img
is more dangerous since it contains a URL, but I can’t right now think of any way to misuse table
that wouldn’t apply equally well to any other HTML tag. (Displaying my ignorance about XSS here, I guess.)
I think if one tries hard enough, one can find a vulnerability with almost any HTML tag! For ’table’, see for example 2.64 here. One can also use things like onclick, onmouseover, etc (these were all blacklisted in my original implementation, as was the particular attack described at the link), and one can further inject inside ’style’.
Currently all HTML tags are blacklisted. The only thing on the whitelist is ’nowiki’, which is a piece of Instiki syntax. I might have forgotten something else, in which case I’ll add it.
Thanks, Richard. It’s a tricky business getting this right, and you’re the man on the spot!
Thanks for all your work on this!
Regarding the img
-tag: I have been using this in many entries, would be sad if the images wouldn’t display anymore. Now I just checked, and they do still display.
The reason I never used :pic
instead is that I don’t know that it supports adjusting the image size.
Hi Urs, thanks! They do still display, but editing will fail next time. My plan is to expand the pic syntax (similarly for other things) so that it has the features that people need. I will look into image size asap.
Unfortunately, though it will be disruptive for a while, I think this is the best way forward for us.
Thanks, I see.
Maybe we can think of this issue as priority, because many of the pages which I update frequently (with recent references, such as flavour anomaly) happen to be those that have many included images.
Can we maybe tell the parser that displaying an image stored on the nLab server is safe to do, and block only the display of images from random external sites?
I think a better solution would be to run an HTML sanitizer after the page is rendered, to remove problematic things and leave other tags alone. This is what vanilla Instiki does, and what Jacques and I have been suggesting for weeks. Instiki’s sanitizer runs on our larger pages in a few seconds.
Definitely this is the highest priority, Urs, I agree completely. Good idea regarding restricting the URLs, I considered this myself; I rejected it because I thought it would be tricky to avoid opening loopholes. Maybe it could be made to work.
Regarding running the Instiki sanitiser afterwards, I have given reasons already privately and more briefly here why I am not keen. It must be remembered that it is easy to demand changes to something from the sides; unfortunately, part of what I have to do is consider how these demands fit into a broader picture, and to try to evaluate whether they are the right thing to do for the software as it is at the present time.
A while back I was wondering how we might pass attribute information to an <img>
without using that tag. I thought something like the following might be made to work.
[[Adjointness.jpg:pic {: width="240"}]]
The seldom used {:
construct can be used to set style
on things like div
s.
Just to record it somewhere: This Friday’s preprint
ought to be recorded at flavour anomaly, right after the entry Cata-Mannel 19,
Maybe we can sneak it in there by some trick?
Hi Richard,
I see that now also <br/>
is forbidden. I have been using this in many entries after you had recently told me I should prefer this over the Instiki hack that I used to do :-).
I see we are getting into some tricky situation here. Hm..
Surely running the instiki sanitizer would be at least as secure as the current solution, and less disruptive to our use of the nlab?
Blocking all HTML is definitely more secure than running a sanitiser! The point of a sanitiser is to valid user-entered HTML; there is no problem to solve if no HTML can be entered! The only loophole is with Tex, where one might possibly be able to cook up an SVG or piece of MathML that is dodgy without explicitly writing it, i.e. programatically. This is far-fetched, though, and can in any case be solved in various ways, e.g. by running a sanitiser just on SVG or MathML blocks (here we can use DOMParser or whatever, since the amount of code is much smaller).
One can also note that the Instiki sanitiser is vulnerable to being out of date, and is vulnerable, because it relies on XML parsing, to XXE attacks and similar (indeed it definitely had holes in this regard before; some of these have now been closed I think in its latest versions, but I wouldn’t be surprised if there are others). I would only ever trust in a dependency a highly actively maintained project like DOMParser for this.
As I wrote, there will definitely be some disruption for a while, but it will pass. We just need to treat the things that crop up one by one, exactly as Urs is doing here. In the long run, it is good anyhow to have tighter control over the permissible syntax.
The point of a sanitizer is to validate the generated HTML, which includes any user-entered HTML as well as the HTML produced through TeX/SVG/MathML/Markdown etc. Since the end goal is to ensure that the generated HTML is not dangerous, the most reliable way to achieve that is to actually validate the generated HTML, rather than trying to deal separately with all the possible ingredients that might go into it.
Security-wise, I’m content for now with the solution of blocking all HTML input (we can discuss the long run later). But from what Urs says it seems to be very disruptive, and I think running a sanitizer would be an equally good solution security-wise that would be less disruptive. If you don’t like the instiki sanitizer, the feedparser sanitizer also runs in only a few seconds on our large pages, and is written in Python.
Re #6, #8, and #13: I’ve now added some new syntax to allow for creation of <img>
tags and <br/>
. For <br\>
, one simply writes \linebreak
. For images, one writes
\begin{imagefromfile}
"file_name": "my_picture.jpg",
"web": "schreiber",
"width": 300,
"height": 100,
"unit": "px",
"float": "right",
"margin": {
"top": 0,
"right": 10,
"bottom": 10,
"left": 0,
"unit": "px",
},
"alt": "My nice picture",
"caption": "My nice caption"
\end{imagefromfile}
The structure inside the block is known as a ’JSON’, it is very widely used (indeed more so than XML in backend web development). All of these options except file_name
are optional or have defaults. The default web
is nlab
, so unless in a very rare case the file was uploaded to a personal web rather than the main nLab, one can ignore this parameter. The width
and height
parameters are omitted by default, and should be fairly self-explanatory, corresponding to these attributes of an img
tag. They must be integers. By default, the unit is px
, but a different one can be specified in unit
: em
is supported at the moment, but we can add whatever else. The alt
attribute is the same as for img
, but only the characters of the Roman alphabet, the numerals 0-9, spaces, and - are allowed, to any possible security vulnerability.
Similarly,float
and margin
add styling in the same way as for ordinary HTML. Currently, float
accepts only left
or right
. Hopefully margin
is by now self-explanatory; it is entered in a JSON structure, but serialises in the usual way to a string (0px 10px 10px 0px in the case above). Underneath the hood, the <img
is wrapped in div styled with these two parameters. If we need any other bits and bobs, I’ll add them as need arises.
The caption
attribute allows one to add a caption to a figure by wrapping the <img
and optional <div
that it lives in into a <figure>
environment containing a <figcaption>
. It has the same restrictions on characters as alt
.
In other words, the minimal use of the new syntax is as follows.
\begin{imagefromfile}
"file_name": "my_picture.jpg"
\end{imagefromfile}
I first considered something like Rod suggested in #11, but I’m moving a bit away from that kind of syntax, because I think a key-value format like JSON is more readable and more flexible, and is easy to parse using a library.
I’ll eventually add this to the HowTo, but it is not currently possible due to a HTML table (a new syntax for that is next on the TODO list).
Regarding #35 here, the above syntax takes care of one use of the div
tag, namely making the figure float; there was one such figure at flavour anomaly, one replaces the entire div
in such cases. The difference with the figure that did not parse at Urs’ personal web is that is clickable, i.e.is a link: it will be straightforward to extend the syntax to cover that case.
We can certainly debate how to proceed. My personal feeling is that, for purely semantical reasons (i.e. ignoring security), the new syntax is preferable to entering HTML directly: we know exactly what the intention is, and have much more control. But it’s true that there are a fair few pages using <img>
. My original feeling was that it would not be too much work to make the syntactical tweaks as we need to, once we cover all syntactical cases that we need, since there are rarely many images on a page (the old syntax still displays). But we can certainly discuss whether that is reasonable. My feeling is that, unless we block the old syntax, it will not get done, but I could be wrong.
If we are making changes to improve security, I’d prefer to make changes which bring broader benefit to the nLab. I don’t feel that a sanitiser does that, the only effect for the average user will be to slow things down and obfuscate, but I think the stricter approach does, because of the semantical benefits of the new syntax.
Despite its frequent denigration, the blacklist we have is actually not bad for stopping attacks; it definitely stops most if not all major javascript attack vectors, without being open to the vulnerabilities of XML parsing. Thus it can serve a purpose in the short term.
Thanks, yes, I do like the new syntax you are creating. I suppose a script could go and replace syntax, though of coure I see that this would mean placing yet more work on your shoulders.
I can certainly cope with adjusting a few img-tags when I see them. I am just worried about random nLab users who may find the edit function not work.
There are 277 pages left on the main nLab with an img
tag in them. If people are happy to make the change in syntax to the new one, we could perhaps just fix these together. I suggest that anyone wishing to help just makes a single new post below which they update when they fix a page. No need to announce the change otherwise, I suggest.
120-cell.md
5-dimensional_supergravity.md
6d_(2,0)-superconformal_QFT.md
action_functional.md
Adams–Novikov_spectral_sequence.md
Adams_spectral_sequence.md
ADE_classification.md
ADE_singularity.md
adjoint_modality.md
adjunction.md
AdS-CFT.md
AdS-QCD_correspondence.md
advanced_and_retarded_causal_propagators.md
A_first_idea_of_quantum_field_theory_--_Fields.md
A_first_idea_of_quantum_field_theory_--_Free_quantum_fields.md
A_first_idea_of_quantum_field_theory_--_Interacting_quantum_fields.md
A_first_idea_of_quantum_field_theory_--_Propagators.md
A_first_idea_of_quantum_field_theory_--_Renormalization.md
A_first_idea_of_quantum_field_theory_--_Spacetime.md
AGT_correspondence.md
algebraic_number.md
anomalous_magnetic_moment.md
associativity.md
asymptotic_safety.md
Aufhebung.md
baryon.md
Bertrand_Toën.md
Bert_Schellekens.md
black_brane.md
black_branes_in_supergravity_--_table.md
black_holes_in_string_theory.md
bottom-up_and_top-down_model_building.md
boundary_conformal_field_theory.md
branched_cover.md
brane.md
carbon.md
category.md
causal_propagator.md
classical_model_structure_on_simplicial_sets.md
cone_(Riemannian_geometry).md
coordinate_system.md
cosmic_cube.md
cosmological_constant.md
CW_complex.md
D4.md
D6-brane.md
D7-brane.md
dark_matter.md
David_Corfield.md
David_Micahel_Roberts.md
David_Michael_Roberts.md
David_Roberts.md
D-brane.md
dependent_product.md
dependent_product_type.md
differentiable_manifold.md
diffiety.md
dihedral_group.md
Dirac_propagator.md
double_category.md
double_dimensional_reduction.md
duality_between_F-theory_and_heterotic_string_theory.md
duality_between_M¤F-theory_and_heterotic_string_theory.md
Dynkin_diagram.md
Dynkin_quiver.md
E6.md
electron-photon_interaction.md
electron_propagator.md
embedding_of_differentiable_manifolds.md
energy.md
enhanced_gauge_symmetry.md
equivariant_sphere_spectrum.md
Euclidean_field_theory.md
exact_couple.md
extranatural_transformation.md
F1-brane.md
Feynman_diagram.md
Feynman_propagator.md
finite_rotation_group.md
flavour_anomaly.md
flop_transition.md
FQFT.md
fractional_D-brane.md
Frobenius_algebra.md
FRS-theorem_on_rational_2d_CFT.md
F-theory.md
Fukaya_category.md
Fulton-MacPherson_operad.md
function_type.md
function_type_natural_deduction_-_table.md
functor.md
fundamental_group_of_the_circle_is_the_integers.md
fundamental_(infinity,1)-category.md
fundamental_solution.md
fundamental_theorem_of_finitely_generated_abelian_groups.md
G2.md
galactic_rotation_curves.md
galaxy_rotation_curve.md
gaugino.md
General_Discussion.md
generalized_(Eilenberg-Steenrod)_cohomology.md
generalized_homology.md
geometric_engineering_of_quantum_field_theory.md
geometric_infinity-function_theory.md
geometric_quantization.md
geometry_of_physics_–_basic_notions_of_category_theory.md
geometry_of_physics_--_BPS_charges.md
geometry_of_physics_--_categories_and_toposes.md
geometry_of_physics_-_cohesive_toposes.md
geometry_of_physics_--_fundamental_super_p-branes.md
geometry_of_physics_--_homotopy_types.md
geometry_of_physics.md
geometry_of_physics_--_perturbative_quantum_field_theory.md
geometry_of_physics_--_superalgebra.md
geometry_of_physics_--_supergeometry.md
geometry_of_physics_--_supersymmetry.md
Gepner_model.md
globe.md
Globular.md
gluing_function.md
gluino.md
graph_complex.md
Green-Schwarz_action_functional.md
G-representation_spheres_are_G-CW-complexes.md
groupoid.md
GSO_projection.md
GUT.md
Hadamard_distribution.md
hadron.md
Hegelian_taco.md
Higgs_field.md
holographic_entanglement_entropy.md
holographic_principle_of_higher_category_theory.md
homotopy_equivalence.md
homotopy_groups_of_spheres.md
homotopy.io.md
homotopy.md
Homotopy_Type_Theory_--_Univalent_Foundations_of_Mathematics.md
Hopf_invariant_one.md
Hořava-Witten_theory.md
How_to_get_started.md
HowTo.md
icosahedron.md
immersion_of_smooth_manifolds.md
infinitesimal_object.md
internal_hom.md
intersecting_D-brane_model.md
interval_object.md
Introduction_to_Cobordism_and_Complex_Oriented_Cohomology.md
Introduction_to_Homotopy_Theory.md
Introduction_to_Spectral_Sequences.md
Introduction_to_Stable_homotopy_theory_--_1-1.md
Introduction_to_Stable_Homotopy_Theory.md
Introduction_to_the_Adams_Spectral_Sequence.md
Introduction_to_Topological_K-Theory.md
Introduction_to_Topology_--_1.md
Introduction_to_Topology_--_2.md
Introduction_to_Topology.md
Jacques_Distler.md
J-homomorphism_and_chromatic_homotopy.md
J-homomorphism.md
Jocelyn_Ireson-Paine.md
John_Baez.md
Joseph_Goguen.md
Kaluza-Klein_monopole.md
Kan_extension.md
Klein-Gordon_equation.md
Lamb_shift.md
landscape_of_string_theory_vacua.md
lax_2-adjunction.md
Lie_2-algebra.md
limits_and_colimits_by_example.md
M2-brane.md
M5-brane.md
M9-brane.md
Mac_Lane's_proof_of_the_coherence_theorem_for_monoidal_categories.md
magic_pyramid.md
manifold.md
mapping_cone.md
M-brane.md
McKay_correspondence.md
McKay_quiver.md
meson.md
microcausal_polynomial_observable.md
model_structure_on_topological_sequential_spectra.md
monad.md
MOND.md
monoid.md
Monster_group.md
M-theory.md
M-theory_on_G2-manifolds.md
normed_division_algebra.md
NS5-brane.md
nucleus_(physics).md
observable_universe.md
octahedron.md
octonion.md
orbifold.md
orientifold_plane.md
perturbation_theory.md
Peter_May.md
photon_propagator.md
piecewise_flat_spacetime.md
(p,q)5-brane.md
(p,q)-string.md
prime_number.md
principal_ideal_domain.md
product_of_simplices.md
quantum_electrodynamics.md
quantum_gravity.md
quantum_propagator.md
quark-gluon_plasma.md
quaternion_group.md
quaternionic_Hopf_fibration.md
radiative_correction.md
renormalization.md
representable_functor.md
Ricci_flow.md
Riemannian_orbifold.md
RR-field_tadpole_cancellation.md
Schur_functor.md
Science_of_Logic.md
simplicial_set.md
skyrmion.md
S-matrix.md
Smith_normal_form.md
sphere.md
Spin(5).md
spin_representation.md
squark.md
standard_model_of_cosmology.md
Starobinsky_model_of_cosmic_inflation.md
star_product.md
statistical_significance.md
stereographic_projection.md
string_phenomenology.md
string_scattering_amplitude.md
string_theory_FAQ.md
string_theory.md
structure_formation.md
subspace_topology.md
supersymmetry_breaking.md
symmetric_group.md
tadpole.md
tangent_bundle.md
Taub-NUT_space.md
tensor_network.md
tensor_networks.md
tetrahedron.md
Top.md
topological_interval.md
topological_manifold.md
topological_vector_bundle.md
Topologie.md
topology.md
torus.md
triality.md
triangle_identities.md
Tully-Fisher_relation.md
twistor_space.md
Urs_Schreiber.md
Urysohn's_lemma.md
vacuum_diagram.md
vacuum_expectation_value.md
variational_bicomplex.md
vector_bundle.md
virtual_double_category.md
weak_gravity_conjecture.md
Wick_algebra.md
Wick_rotation.md
Wick's_lemma.md
worldline_formalism.md
worldsheet_instanton.md
wrapped_brane.md
Yoneda_embedding.md
Yoneda_lemma.md
Yukawa_coupling.md
Ones that I’ve done so far.
5-dimensional supergravity (NB: The quality of this image is poor. Can we rescan it or type it out?)
6d (2,0)-superconformal QFT (This is a good example of how to use the float syntax, replacing a div with an img in it).
action functional (This is a good example of how to add a caption in the syntax.)
Perhaps it would be better to list these in a new thread?
Hi Ali, thanks,you’re probably right! But maybe since I’ve started, we can continue here? I guess not all that many people will contribute, so it shouldn’t take up too much space, and it is related at least to the security discussion.
All right, thanks Richard. I changed the image-syntax at ADE singularity now. Will do more as I find the time.
Great! I think this is a very good development, irrespective of security, so it is worth the effort. It allows for example to switch the rendering of images across all of the nLab without any edits. It also provides consistency. And things like the caption option make it trivial for the nLab editor to do things that previously needed non-trivial CSS.
Sounds good!
Two small things, vaguely related
1) I just tried to use a \lineabreak
within a table, but it didn’t seem to work. Not too important, but if this has an easy fix, I would appreciate it.
2) Some user’s puzzlement alerted me that the third line at HowTo – How to upload a file does not render properly anymore. It’s supposed to show the escaped code for triggering file upload, but instead it shows an escaped a-tag now. (I am guessing this might be side-effect of your latest work? ) Again, not too urgent, but this might be worth fixing to avoid some user confusion.
1) I just tried in the Sandbox, and it seemed to work to me. Could you point me to an example where it does not work but where <br/>
does work?
2) I have fixed this and a number of other similar cases at HowTo now. I have also described the new image syntax in the ’Image files’ section, and made a few other updates. The cause of the rendering issues actually goes back to the major changes in the renderer in June; because of the interaction between the new and old renderer, it is now necessary to use a <nowiki>...</nowiki>
block in cases like this (and this is probably good practise anyhow). I have been fixing these when I see them/have the energy to do so; it typically only affects ’meta pages’ like HowTo. Let me know if I’ve still overlooked something at the latter page.
Regarding sanitizing input versus output.
The particular attacks listed on this page appear to be blocked with your current input filter. But you can’t expect to be able to list all attacks; the point is that filtering the output rather than the input is the secure way to go.
1 to 28 of 28