xml >> Unicode values

by tamizhselvys » Tue, 19 Feb 2008 22:00:55 GMT


Can any one explain me the difference between unicode and hexadecimal
entity used in xml.


xml >> Unicode values

by Andreas Prilop » Tue, 19 Feb 2008 22:23:51 GMT

For example, the Devanagari letter 'ka' has the position U+0915
in Unicode and can be referenced in both HTML and XML as क
or as क .

Solipsists of the world - unite!

xml >> Unicode values

by Andy Dingley » Wed, 20 Feb 2008 00:25:14 GMT

Try searching for "Jukka Korpela" and Unicode. He has an O'Reilly book
and a very useful website on the topic. Wikipedia is worth reading

"Unicode" defines a "character set". There are also "encodings" that
specify how computers interpret sequences of bytes or numbers to turn
them into characters. There may be many encodings that all specify the
same character in the same character set, which can get complicated.

Character sets before Unicode tended to work for only one language at
a time. This made them manageably smaller, but also inconvenient for
multi-language work. Unicode takes the different approach: one single,
huge character set for everything.

When you use HTML or XML, there is only _one_ character set that is
ever used: Unicode.

There may be lots of different encodings for a HTML or XML document
(one at a time), but they all lead to Unicode characters. Most
commonly you will specify a character directly (e.g. by typing it),
which also requires you to make sure it's in a suitable encoding for
the document. Alternatively you can use a "numeric character entity"
to specify the Unicode character " by its identifying number, either
in decimal ø or in hexadecimal ø No matter what the
document's encoding, these same numbers refer to these same
characters: it's skipping the encoding and going straight to Unicode.
This works equally in XML or HTML.

For a few of these characters, there are also "character entity
references" defined for HTML, such as ø (meaning the same "o
with a slash" character as before). These are a bit more readable than
the raw numbers. However remember that they're part of HTML only, not
XML! So you can use them in XHTML, but not in RSS.

(I've confused some definitions here between bytes / octets,
characters / codepoints and Unicode / UCS / ISO10646 in an attempt at
brevity, if not clarity. Jukka will probably accuse me of "worthless
babbling" again as a result)

Similar Threads

1. Xml and unicode values with &#x...

2. Does it exist an utility to replace Latex accented character by their ISO or Unicode value?

  before I get into coding... this must aleray exist.
I have a large (book) Latex file partially computer generated, where many
characters are "Latex encoded" for accents.  
\emph{``Si tu veux danser, commence o\`{u} tu es}\textbf{\emph{``}}\emph\\
 (Intervention d'un r\'esident \'etranger lors d'une r\'eunion
Since Latex (and Omega) accept now Iso and unicode encoding ofg the accents,
I would like to replace all the "[La]tex encoded accents by the
corresponding accented ISO/Unicode characters, in order to make the Latex
file more readable.
This would be nice id Unicode was supporting italic, and bold caracters too.
The idea is just to allow the author of the book who is a bit "LAtex shy" to
take over and do the job himself.

3. XML parsing error: Invalid unicode character value for this platform

4. possible: load non-unicode file as unicode file (into XmlDocument)


I have some files XML files that contain unicode characters.
Unfortunately, the source developer of the files removed the unicode
designator characters from the beginning of all the documents, so they
are not interpreted by XmlDocument as unicode files when they are
loaded using Load().

Is it possible to load them into an XmlDocument and force the parser to
see them as unicode (even though they are not correctly unicode files)?


5. href values and value-of

6. How to tranform an XML tag value to an atrribute value


I want to tranform following

<extends xlink:type="simple" xlink:href="/public/a1.xml"></extends>

using XSL.

How does the code should look like in XSL file??


7. selecting the value of the node based on the value of parameter

8. wrapping long attribute value (line-continuation for attribute value)

I have an extra long attribute value, and can't find a way to wrap it
several short lines in my text editor, so that it stays nicely human

Is there a line-continuation escape-code for attribute values, which
could be
used in a std text edior for writing nicely formatted xml?

Just to avoid a misunderstaning: I do not want to have NewLines in the
the resulting attribute value, I just want to format in the raw xml in
the text editor.

E.g. I would want to do something like this ("\n" is faked bleow)
attributename="very LONG attribute text\n
attribute text continued"