Mercurial > hg > xemacs-beta
comparison lisp/unicode.el @ 4834:b3ea9c582280
Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
author | Ben Wing <ben@xemacs.org> |
---|---|
date | Tue, 12 Jan 2010 01:38:04 -0600 |
parents | 980575c76541 |
children | c0934cef10c6 |
comparison
equal
deleted
inserted
replaced
4833:4dd2389173fc | 4834:b3ea9c582280 |
---|---|
331 | 331 |
332 A fixed-width four-byte encoding, characters less than #x10FFFF are not | 332 A fixed-width four-byte encoding, characters less than #x10FFFF are not |
333 supported. " | 333 supported. " |
334 unicode-type ucs-4 little-endian t)) | 334 unicode-type ucs-4 little-endian t)) |
335 | 335 |
336 (make-coding-system | 336 ;; Now defined in unicode.c. |
337 'utf-8 'unicode | 337 |
338 "UTF-8" | 338 ;;(make-coding-system |
339 '(mnemonic "UTF8" | 339 ;; 'utf-8 'unicode |
340 documentation " | 340 ;; "UTF-8" |
341 UTF-8 Unicode encoding -- ASCII-compatible 8-bit variable-width encoding | 341 ;; '(mnemonic "UTF8" |
342 sharing the following principles with the Mule-internal encoding: | 342 ;; documentation "..." |
343 | 343 ;; unicode-type utf-8)) |
344 -- All ASCII characters (codepoints 0 through 127) are represented | |
345 by themselves (i.e. using one byte, with the same value as the | |
346 ASCII codepoint), and these bytes are disjoint from bytes | |
347 representing non-ASCII characters. | |
348 | |
349 This means that any 8-bit clean application can safely process | |
350 UTF-8-encoded text as it were ASCII, with no corruption (e.g. a | |
351 '/' byte is always a slash character, never the second byte of | |
352 some other character, as with Big5, so a pathname encoded in | |
353 UTF-8 can safely be split up into components and reassembled | |
354 again using standard ASCII processes). | |
355 | |
356 -- Leading bytes and non-leading bytes in the encoding of a | |
357 character are disjoint, so moving backwards is easy. | |
358 | |
359 -- Given only the leading byte, you know how many following bytes | |
360 are present. | |
361 " | |
362 unicode-type utf-8)) | |
363 | 344 |
364 (make-coding-system | 345 (make-coding-system |
365 'utf-8-bom 'unicode | 346 'utf-8-bom 'unicode |
366 "UTF-8 w/BOM" | 347 "UTF-8 w/BOM" |
367 '(mnemonic "MSW-UTF8" | 348 '(mnemonic "MSW-UTF8" |