771
+ − 1 /* Text encoding conversion functions; coding-system object.
+ − 2 #### rename me to coding-system.c or coding.c
428
+ − 3 Copyright (C) 1991, 1995 Free Software Foundation, Inc.
+ − 4 Copyright (C) 1995 Sun Microsystems, Inc.
771
+ − 5 Copyright (C) 2000, 2001, 2002 Ben Wing.
428
+ − 6
+ − 7 This file is part of XEmacs.
+ − 8
+ − 9 XEmacs is free software; you can redistribute it and/or modify it
+ − 10 under the terms of the GNU General Public License as published by the
+ − 11 Free Software Foundation; either version 2, or (at your option) any
+ − 12 later version.
+ − 13
+ − 14 XEmacs is distributed in the hope that it will be useful, but WITHOUT
+ − 15 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ − 16 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ − 17 for more details.
+ − 18
+ − 19 You should have received a copy of the GNU General Public License
+ − 20 along with XEmacs; see the file COPYING. If not, write to
+ − 21 the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ − 22 Boston, MA 02111-1307, USA. */
+ − 23
771
+ − 24 /* Synched up with: Not in FSF. */
+ − 25
+ − 26 /* Authorship:
+ − 27
+ − 28 Current primary author: Ben Wing <ben@xemacs.org>
+ − 29
+ − 30 Rewritten by Ben Wing <ben@xemacs.org>, based originally on coding.c
+ − 31 from Mule 2.? but probably does not share one line of code with that
+ − 32 original source. Rewriting work started around Dec. 1994. or Jan. 1995.
+ − 33 Proceeded in earnest till Nov. 1995.
+ − 34
+ − 35 Around Feb. 17, 1998, Andy Piper renamed what was then mule-coding.c to
+ − 36 file-coding.c, with the intention of using it to do end-of-line conversion
+ − 37 on non-MULE machines (specifically, on Windows machines). He separated
+ − 38 out the MULE stuff from non-MULE using ifdef's, and searched throughout
+ − 39 the rest of the source tree looking for coding-system-related code that
+ − 40 was ifdef MULE but should be ifdef HAVE_CODING_SYSTEMS.
+ − 41
+ − 42 Sept. 4 - 8, 1998, Tomohiko Morioka added the UCS_4 and UTF_8 coding system
+ − 43 types, providing a primitive means of decoding and encoding externally-
+ − 44 formatted Unicode/UCS_4 and Unicode/UTF_8 data.
+ − 45
+ − 46 January 25, 2000, Martin Buchholz redid and fleshed out the coding
+ − 47 system alias handling that was first added in prototype form by
+ − 48 Hrjove Niksic, April 15, 1999.
+ − 49
+ − 50 April to May 2000, Ben Wing: More major reorganization. Adding features
+ − 51 needed for MS Windows (multibyte, unicode, unicode-to-multibyte), the
+ − 52 "chain" coding system for chaining two together, and doing a lot of
+ − 53 reorganization in preparation for properly abstracting out the different
+ − 54 coding system types.
+ − 55
+ − 56 June 2001, Ben Wing: Added Unicode support. Eliminated previous
+ − 57 junky Unicode translation support.
+ − 58
+ − 59 August 2001, Ben Wing: Moved Unicode support to unicode.c. Finished
+ − 60 abstracting everything except detection, which is hard to abstract (see
+ − 61 just below).
+ − 62
+ − 63 September 2001, Ben Wing: Moved Mule code to mule-coding.c, Windows code
+ − 64 to intl-win32.c. Lots more rewriting; very little code is untouched
+ − 65 from before April 2000. Abstracted the detection code, added multiple
+ − 66 levels of likelihood to increase the reliability of the algorithm.
+ − 67
+ − 68 October 2001, Ben Wing: HAVE_CODING_SYSTEMS is always now defined.
+ − 69 Removed the conditionals.
+ − 70 */
+ − 71
+ − 72 /* Comments about future work
+ − 73
+ − 74 ------------------------------------------------------------------
+ − 75 ABOUT DETECTION
+ − 76 ------------------------------------------------------------------
+ − 77
+ − 78 however, in general the detection code has major problems and needs lots
+ − 79 of work:
+ − 80
+ − 81 -- instead of merely "yes" or "no" for particular categories, we need a
+ − 82 more flexible system, with various levels of likelihood. Currently
+ − 83 I've created a system with six levels, as follows:
+ − 84
+ − 85 [see file-coding.h]
+ − 86
+ − 87 Let's consider what this might mean for an ASCII text detector. (In
+ − 88 order to have accurate detection, especially given the iteration I
+ − 89 proposed below, we need active detectors for *all* types of data we
+ − 90 might reasonably encounter, such as ASCII text files, binary files,
+ − 91 and possibly other sorts of ASCII files, and not assume that simply
+ − 92 "falling back to no detection" will work at all well.)
+ − 93
+ − 94 An ASCII text detector DOES NOT report ASCII text as level 0, since
+ − 95 that's what the detector is looking for. Such a detector ideally
+ − 96 wants all bytes in the range 0x20 - 0x7E (no high bytes!), except for
+ − 97 whitespace control chars and perhaps a few others; LF, CR, or CRLF
+ − 98 sequences at regular intervals (where "regular" might mean an average
+ − 99 < 100 chars and 99% < 300 for code and other stuff of the "text file
+ − 100 w/line breaks" variety, but for the "text file w/o line breaks"
+ − 101 variety, excluding blank lines, averages could easily be 600 or more
+ − 102 with 2000-3000 char "lines" not so uncommon); similar statistical
+ − 103 variance between odds and evens (not Unicode); frequent occurrences of
+ − 104 the space character; letters more common than non-letters; etc. Also
+ − 105 checking for too little variability between frequencies of characters
+ − 106 and for exclusion of particular characters based on character ranges
+ − 107 can catch ASCII encodings like base-64, UUEncode, UTF-7, etc.
+ − 108 Granted, this doesn't even apply to everything called "ASCII", and we
+ − 109 could potentially distinguish off ASCII for code, ASCII for text,
+ − 110 etc. as separate categories. However, it does give us a lot to work
+ − 111 off of, in deciding what likelihood to choose -- and it shows there's
+ − 112 in fact a lot of detectable patterns to look for even in something
+ − 113 seemingly so generic as ASCII. The detector would report most text
+ − 114 files in level 1 or level 2. EUC encodings, Shift-JIS, etc. probably
+ − 115 go to level -1 because they also pass the EOL test and all other tests
+ − 116 for the ASCII part of the text, but have lots of high bytes, which in
+ − 117 essence turn them into binary. Aberrant text files like something in
+ − 118 BASE64 encoding might get placed in level 0, because they pass most
+ − 119 tests but fail dramatically the frequency test; but they should not be
+ − 120 reported as any lower, because that would cause explicit prompting,
+ − 121 and the user should be able any valid text file without prompting.
+ − 122 The escape sequences and the base-64-type checks might send 7-bit
+ − 123 iso2022 to 0, but probably not -1, for similar reasons.
+ − 124
+ − 125 -- The assumed algorithm for the above detection levels is to in essence
+ − 126 sort categories first by detection level and then by priority.
+ − 127 Perhaps, however, we would want smarter algorithms, or at least
+ − 128 something user-controllable -- in particular, when (other than no
+ − 129 category at level 0 or greater) do we prompt the user to pick a
+ − 130 category?
+ − 131
+ − 132 -- Improvements in how the detection algorithm works: we want to handle
+ − 133 lots of different ways something could be encoded, including multiple
+ − 134 stacked encodings. trying to specify a series of detection levels
+ − 135 (check for base64 first, then check for gzip, then check for an i18n
+ − 136 decoding, then for crlf) won't generally work. for example, what
+ − 137 about the same encoding appearing more than once? for example, take
+ − 138 euc-jp, base64'd, then gzip'd, then base64'd again: this could well
+ − 139 happen, and you could specify the encodings specifically as
+ − 140 base64|gzip|base64|euc-jp, but we'd like to autodetect it without
+ − 141 worrying about exactly what order these things appear in. we should
+ − 142 allow for iterating over detection/decoding cycles until we reach
+ − 143 some maximum (we got stuck in a loop, due to incorrect category
+ − 144 tables or detection algorithms), have no reported detection levels
+ − 145 over -1, or we end up with no change after a decoding pass (i.e. the
+ − 146 coding system associated with a chosen category was `no-conversion'
+ − 147 or something equivalent). it might make sense to divide things into
+ − 148 two phases (internal and external), where the internal phase has a
+ − 149 separate category list and would probably mostly end up handling EOL
+ − 150 detection; but the i think about it, the more i disagree. with
+ − 151 properly written detectors, and properly organized tables (in
+ − 152 general, those decodings that are more "distinctive" and thus
+ − 153 detectable with greater certainty go lower on the list), we shouldn't
+ − 154 need two phases. for example, let's say the example above was also
+ − 155 in CRLF format. The EOL detector (which really detects *plain text*
+ − 156 with a particular EOL type) would return at most level 0 for all
+ − 157 results until the text file is reached, whereas the base64, gzip or
+ − 158 euc-jp decoders will return higher. Once the text file is reached,
+ − 159 the EOL detector will return 0 or higher for the CRLF encoding, and
+ − 160 all other decoders will return 0 or lower; thus, we will successfully
+ − 161 proceed through CRLF decoding, or at worst prompt the user. (The only
+ − 162 external-vs-internal distinction that might make sense here is to
+ − 163 favor coding systems of the correct source type over those that
+ − 164 require conversion between external and internal; if done right, this
+ − 165 could allow the CRLF detector to return level 1 for all CRLF-encoded
+ − 166 text files, even those that look like Base-64 or similar encoding, so
+ − 167 that CRLF encoding will always get decoded without prompting, but not
+ − 168 interfere with other decoders. On the other hand, this
+ − 169 external-vs-internal distinction may not matter at all -- with
+ − 170 automatic internal-external conversion, CRLF decoding can occur
+ − 171 before or after decoding of euc-jp, base64, iso2022, or similar,
+ − 172 without any difference in the final results.)
+ − 173
+ − 174 -- There need to be two priority lists and two
+ − 175 category->coding-system lists. Once is general, the other
+ − 176 category->langenv-specific. The user sets the former, the langenv
+ − 177 category->the latter. The langenv-specific entries take precedence
+ − 178 category->over the others. This works similarly to the
+ − 179 category->category->Unicode charset priority list.
+ − 180
+ − 181 -- The simple list of coding categories per detectors is not enough.
+ − 182 Instead of coding categories, we need parameters. For example,
+ − 183 Unicode might have separate detectors for UTF-8, UTF-7, UTF-16,
+ − 184 and perhaps UCS-4; or UTF-16/UCS-4 would be one detection type.
+ − 185 UTF-16 would have parameters such as "little-endian" and "needs BOM",
+ − 186 and possibly another one like "collapse/expand/leave alone composite
+ − 187 sequences" once we add this support. Usually these parameters
+ − 188 correspond directly to a coding system parameter. Different
+ − 189 likelihood values can be specified for each parameter as well as for
+ − 190 the detection type as a whole. The user can specify particular
+ − 191 coding systems for a particular combination of detection type and
+ − 192 parameters, or can give "default parameters" associated with a
+ − 193 detection type. In the latter case, we create a new coding system as
+ − 194 necessary that corresponds to the detected type and parameters.
+ − 195
+ − 196 -- a better means of presentation. rather than just coming up
+ − 197 with the new file decoded according to the detected coding
+ − 198 system, allow the user to browse through the file and
+ − 199 conveniently reject it if it looks wrong; then detection
+ − 200 starts again, but with that possibility removed. in cases where
+ − 201 certainty is low and thus more than one possibility is presented,
+ − 202 the user can browse each one and select one or reject them all.
+ − 203
+ − 204 -- fail-safe: even after the user has made a choice, if they
+ − 205 later on realize they have the wrong coding system, they can
+ − 206 go back, and we've squirreled away the original data so they
+ − 207 can start the process over. this may be tricky.
+ − 208
+ − 209 -- using a larger buffer for detection. we use just a small
+ − 210 piece, which can give quite random results. we may need to
+ − 211 buffer up all the data we look through because we can't
+ − 212 necessarily rewind. the idea is we proceed until we get a
+ − 213 result that's at least at a certain level of certainty
+ − 214 (e.g. "probable") or we reached a maximum limit of how much
+ − 215 we want to buffer.
+ − 216
+ − 217 -- dealing with interactive systems. we might need to go ahead
+ − 218 and present the data before we've finished detection, and
+ − 219 then re-decode it, perhaps multiple times, as we get better
+ − 220 detection results.
+ − 221
+ − 222 -- Clearly some of these are more important than others. at the
+ − 223 very least, the "better means of presentation" should be
+ − 224 implementation as soon as possibl, along with a very simple means
+ − 225 of fail-safe whenever the data is readibly available, e.g. it's
+ − 226 coming from a file, which is the most common scenario.
+ − 227
+ − 228
+ − 229 ------------------------------------------------------------------
+ − 230 ABOUT FORMATS
+ − 231 ------------------------------------------------------------------
+ − 232
+ − 233 when calling make-coding-system, the name can be a cons of (format1 .
+ − 234 format2), specifying that it decodes format1->format2 and encodes the other
+ − 235 way. if only one name is given, that is assumed to be format1, and the
+ − 236 other is either `external' or `internal' depending on the end type.
+ − 237 normally the user when decoding gives the decoding order in formats, but
+ − 238 can leave off the last one, `internal', which is assumed. a multichain
+ − 239 might look like gzip|multibyte|unicode, using the coding systems named
+ − 240 `gzip', `(unicode . multibyte)' and `unicode'. the way this actually works
+ − 241 is by searching for gzip->multibyte; if not found, look for gzip->external
+ − 242 or gzip->internal. (In general we automatically do conversion between
+ − 243 internal and external as necessary: thus gzip|crlf does the expected, and
+ − 244 maps to gzip->external, external->internal, crlf->internal, which when
+ − 245 fully specified would be gzip|external:external|internal:crlf|internal --
+ − 246 see below.) To forcibly fit together two converters that have explicitly
+ − 247 specified and incompatible names (say you have unicode->multibyte and
+ − 248 iso8859-1->ebcdic and you know that the multibyte and iso8859-1 in this
+ − 249 case are compatible), you can force-cast using :, like this:
+ − 250 ebcdic|iso8859-1:multibyte|unicode. (again, if you force-cast between
+ − 251 internal and external formats, the conversion happens automatically.)
+ − 252
+ − 253 --------------------------------------------------------------------------
+ − 254 ABOUT PDUMP, UNICODE, AND RUNNING XEMACS FROM A DIRECTORY WITH WEIRD CHARS
+ − 255 --------------------------------------------------------------------------
+ − 256
+ − 257 -- there's the problem that XEmacs can't be run in a directory with
+ − 258 non-ASCII/Latin-1 chars in it, since it will be doing Unicode
+ − 259 processing before we've had a chance to load the tables. In fact,
+ − 260 even finding the tables in such a situation is problematic using
+ − 261 the normal commands. my idea is to eventually load the stuff
+ − 262 extremely extremely early, at the same time as the pdump data gets
+ − 263 loaded. in fact, the unicode table data (stored in an efficient
+ − 264 binary format) can even be stuck into the pdump file (which would
+ − 265 mean as a resource to the executable, for windows). we'd need to
+ − 266 extend pdump a bit: to allow for attaching extra data to the pdump
+ − 267 file. (something like pdump_attach_extra_data (addr, length)
+ − 268 returns a number of some sort, an index into the file, which you
+ − 269 can then retrieve with pdump_load_extra_data(), which returns an
+ − 270 addr (mmap()ed or loaded), and later you pdump_unload_extra_data()
+ − 271 when finished. we'd probably also need
+ − 272 pdump_attach_extra_data_append(), which appends data to the data
+ − 273 just written out with pdump_attach_extra_data(). this way,
+ − 274 multiple tables in memory can be written out into one contiguous
+ − 275 table. (we'd use the tar-like trick of allowing new blocks to be
+ − 276 written without going back to change the old blocks -- we just rely
+ − 277 on the end of file/end of memory.) this same mechanism could be
+ − 278 extracted out of pdump and used to handle the non-pdump situation
+ − 279 (or alternatively, we could just dump either the memory image of
+ − 280 the tables themselves or the compressed binary version). in the
+ − 281 case of extra unicode tables not known about at compile time that
+ − 282 get loaded before dumping, we either just dump them into the image
+ − 283 (pdump and all) or extract them into the compressed binary format,
+ − 284 free the original tables, and treat them like all other tables.
+ − 285
+ − 286 --------------------------------------------------------------------------
+ − 287 HANDLING WRITING A FILE SAFELY, WITHOUT DATA LOSS
+ − 288 --------------------------------------------------------------------------
+ − 289
+ − 290 -- When writing a file, we need error detection; otherwise somebody
+ − 291 will create a Unicode file without realizing the coding system
+ − 292 of the buffer is Raw, and then lose all the non-ASCII/Latin-1
+ − 293 text when it's written out. We need two levels
+ − 294
+ − 295 1. first, a "safe-charset" level that checks before any actual
+ − 296 encoding to see if all characters in the document can safely
+ − 297 be represented using the given coding system. FSF has a
+ − 298 "safe-charset" property of coding systems, but it's stupid
+ − 299 because this information can be automatically derived from
+ − 300 the coding system, at least the vast majority of the time.
+ − 301 What we need is some sort of
+ − 302 alternative-coding-system-precedence-list, langenv-specific,
+ − 303 where everything on it can be checked for safe charsets and
+ − 304 then the user given a list of possibilities. When the user
+ − 305 does "save with specified encoding", they should see the same
+ − 306 precedence list. Again like with other precedence lists,
+ − 307 there's also a global one, and presumably all coding systems
+ − 308 not on other list get appended to the end (and perhaps not
+ − 309 checked at all when doing safe-checking?). safe-checking
+ − 310 should work something like this: compile a list of all
+ − 311 charsets used in the buffer, along with a count of chars
+ − 312 used. that way, "slightly unsafe" charsets can perhaps be
+ − 313 presented at the end, which will lose only a few characters
+ − 314 and are perhaps what the users were looking for.
+ − 315
+ − 316 2. when actually writing out, we need error checking in case an
+ − 317 individual char in a charset can't be written even though the
+ − 318 charsets are safe. again, the user gets the choice of other
+ − 319 reasonable coding systems.
+ − 320
+ − 321 3. same thing (error checking, list of alternatives, etc.) needs
+ − 322 to happen when reading! all of this will be a lot of work!
+ − 323
+ − 324
+ − 325 --ben
+ − 326 */
428
+ − 327
+ − 328 #include <config.h>
+ − 329 #include "lisp.h"
+ − 330
+ − 331 #include "buffer.h"
+ − 332 #include "elhash.h"
+ − 333 #include "insdel.h"
+ − 334 #include "lstream.h"
440
+ − 335 #include "opaque.h"
771
+ − 336 #include "file-coding.h"
+ − 337
+ − 338 #ifdef HAVE_ZLIB
+ − 339 #include "zlib.h"
428
+ − 340 #endif
+ − 341
+ − 342 Lisp_Object Vkeyboard_coding_system;
+ − 343 Lisp_Object Vterminal_coding_system;
+ − 344 Lisp_Object Vcoding_system_for_read;
+ − 345 Lisp_Object Vcoding_system_for_write;
+ − 346 Lisp_Object Vfile_name_coding_system;
+ − 347
771
+ − 348 #ifdef DEBUG_XEMACS
+ − 349 Lisp_Object Vdebug_coding_detection;
440
+ − 350 #endif
771
+ − 351
+ − 352 typedef struct coding_system_type_entry
+ − 353 {
+ − 354 struct coding_system_methods *meths;
+ − 355 } coding_system_type_entry;
+ − 356
+ − 357 typedef struct
+ − 358 {
+ − 359 Dynarr_declare (coding_system_type_entry);
+ − 360 } coding_system_type_entry_dynarr;
+ − 361
+ − 362 static coding_system_type_entry_dynarr *the_coding_system_type_entry_dynarr;
+ − 363
+ − 364 static const struct lrecord_description cste_description_1[] = {
+ − 365 { XD_STRUCT_PTR, offsetof (coding_system_type_entry, meths), 1, &coding_system_methods_description },
+ − 366 { XD_END }
+ − 367 };
+ − 368
+ − 369 static const struct struct_description cste_description = {
+ − 370 sizeof (coding_system_type_entry),
+ − 371 cste_description_1
+ − 372 };
+ − 373
+ − 374 static const struct lrecord_description csted_description_1[] = {
+ − 375 XD_DYNARR_DESC (coding_system_type_entry_dynarr, &cste_description),
428
+ − 376 { XD_END }
+ − 377 };
+ − 378
771
+ − 379 static const struct struct_description csted_description = {
+ − 380 sizeof (coding_system_type_entry_dynarr),
+ − 381 csted_description_1
+ − 382 };
+ − 383
+ − 384 static Lisp_Object Vcoding_system_type_list;
+ − 385
+ − 386 /* Coding system currently associated with each coding category. */
+ − 387 Lisp_Object coding_category_system[MAX_DETECTOR_CATEGORIES];
+ − 388
+ − 389 /* Table of all coding categories in decreasing order of priority.
+ − 390 This describes a permutation of the possible coding categories. */
+ − 391 int coding_category_by_priority[MAX_DETECTOR_CATEGORIES];
+ − 392
+ − 393 /* Value used with to give a unique name to nameless coding systems */
+ − 394 int coding_system_tick;
+ − 395
+ − 396 int coding_detector_count;
+ − 397 int coding_detector_category_count;
+ − 398
+ − 399 detector_dynarr *all_coding_detectors;
+ − 400
+ − 401 static const struct lrecord_description struct_detector_category_description_1[]
+ − 402 =
+ − 403 {
+ − 404 { XD_LISP_OBJECT, offsetof (struct detector_category, sym) },
+ − 405 { XD_END }
+ − 406 };
+ − 407
+ − 408 static const struct struct_description struct_detector_category_description =
+ − 409 {
+ − 410 sizeof (struct detector_category),
+ − 411 struct_detector_category_description_1
428
+ − 412 };
+ − 413
771
+ − 414 static const struct lrecord_description detector_category_dynarr_description_1[] =
+ − 415 {
+ − 416 XD_DYNARR_DESC (detector_category_dynarr,
+ − 417 &struct_detector_category_description),
+ − 418 { XD_END }
+ − 419 };
+ − 420
+ − 421 static const struct struct_description detector_category_dynarr_description = {
+ − 422 sizeof (detector_category_dynarr),
+ − 423 detector_category_dynarr_description_1
+ − 424 };
+ − 425
+ − 426 static const struct lrecord_description struct_detector_description_1[]
+ − 427 =
+ − 428 {
+ − 429 { XD_STRUCT_PTR, offsetof (struct detector, cats), 1,
+ − 430 &detector_category_dynarr_description },
+ − 431 { XD_END }
+ − 432 };
+ − 433
+ − 434 static const struct struct_description struct_detector_description =
+ − 435 {
+ − 436 sizeof (struct detector),
+ − 437 struct_detector_description_1
+ − 438 };
+ − 439
+ − 440 static const struct lrecord_description detector_dynarr_description_1[] =
+ − 441 {
+ − 442 XD_DYNARR_DESC (detector_dynarr, &struct_detector_description),
+ − 443 { XD_END }
+ − 444 };
+ − 445
+ − 446 static const struct struct_description detector_dynarr_description = {
+ − 447 sizeof (detector_dynarr),
+ − 448 detector_dynarr_description_1
+ − 449 };
428
+ − 450
+ − 451 Lisp_Object Qcoding_systemp;
+ − 452
771
+ − 453 Lisp_Object Qraw_text;
428
+ − 454
+ − 455 Lisp_Object Qmnemonic, Qeol_type;
+ − 456 Lisp_Object Qcr, Qcrlf, Qlf;
+ − 457 Lisp_Object Qeol_cr, Qeol_crlf, Qeol_lf;
+ − 458 Lisp_Object Qpost_read_conversion;
+ − 459 Lisp_Object Qpre_write_conversion;
+ − 460
771
+ − 461 Lisp_Object Qtranslation_table_for_decode;
+ − 462 Lisp_Object Qtranslation_table_for_encode;
+ − 463 Lisp_Object Qsafe_chars;
+ − 464 Lisp_Object Qsafe_charsets;
+ − 465 Lisp_Object Qmime_charset;
+ − 466 Lisp_Object Qvalid_codes;
+ − 467
+ − 468 Lisp_Object Qno_conversion;
+ − 469 Lisp_Object Qconvert_eol;
440
+ − 470 Lisp_Object Qescape_quoted;
771
+ − 471 Lisp_Object Qencode, Qdecode;
+ − 472
+ − 473 Lisp_Object Qconvert_eol_lf, Qconvert_eol_cr, Qconvert_eol_crlf;
+ − 474 Lisp_Object Qconvert_eol_autodetect;
+ − 475
+ − 476 Lisp_Object Qnear_certainty, Qquite_probable, Qsomewhat_likely;
+ − 477 Lisp_Object Qas_likely_as_unlikely, Qsomewhat_unlikely, Qquite_improbable;
+ − 478 Lisp_Object Qnearly_impossible;
+ − 479
+ − 480 Lisp_Object Qdo_eol, Qdo_coding;
+ − 481
+ − 482 Lisp_Object Qcanonicalize_after_coding;
+ − 483
+ − 484 /* This is used to convert autodetected coding systems into existing
+ − 485 systems. For example, the chain undecided->convert-eol-autodetect may
+ − 486 have its separate parts detected as mswindows-multibyte and
+ − 487 convert-eol-crlf, and the result needs to be mapped to
+ − 488 mswindows-multibyte-dos. */
+ − 489 /* #### It's not clear we need this whole chain-canonicalize mechanism
+ − 490 any more. */
+ − 491 static Lisp_Object Vchain_canonicalize_hash_table;
+ − 492
+ − 493 #ifdef HAVE_ZLIB
+ − 494 Lisp_Object Qgzip;
428
+ − 495 #endif
771
+ − 496
+ − 497 /* Maps coding system names to either coding system objects or (for
+ − 498 aliases) other names. */
+ − 499 static Lisp_Object Vcoding_system_hash_table;
428
+ − 500
+ − 501 int enable_multibyte_characters;
+ − 502
+ − 503 EXFUN (Fcopy_coding_system, 2);
+ − 504
+ − 505
+ − 506 /************************************************************************/
771
+ − 507 /* Coding system object methods */
428
+ − 508 /************************************************************************/
+ − 509
+ − 510 static Lisp_Object
+ − 511 mark_coding_system (Lisp_Object obj)
+ − 512 {
+ − 513 Lisp_Coding_System *codesys = XCODING_SYSTEM (obj);
+ − 514
+ − 515 mark_object (CODING_SYSTEM_NAME (codesys));
771
+ − 516 mark_object (CODING_SYSTEM_DESCRIPTION (codesys));
428
+ − 517 mark_object (CODING_SYSTEM_MNEMONIC (codesys));
771
+ − 518 mark_object (CODING_SYSTEM_DOCUMENTATION (codesys));
428
+ − 519 mark_object (CODING_SYSTEM_EOL_LF (codesys));
+ − 520 mark_object (CODING_SYSTEM_EOL_CRLF (codesys));
+ − 521 mark_object (CODING_SYSTEM_EOL_CR (codesys));
771
+ − 522 mark_object (CODING_SYSTEM_SUBSIDIARY_PARENT (codesys));
+ − 523 mark_object (CODING_SYSTEM_CANONICAL (codesys));
+ − 524
+ − 525 MAYBE_CODESYSMETH (codesys, mark, (obj));
428
+ − 526
+ − 527 mark_object (CODING_SYSTEM_PRE_WRITE_CONVERSION (codesys));
+ − 528 return CODING_SYSTEM_POST_READ_CONVERSION (codesys);
+ − 529 }
+ − 530
+ − 531 static void
771
+ − 532 print_coding_system_properties (Lisp_Object obj, Lisp_Object printcharfun)
+ − 533 {
+ − 534 Lisp_Coding_System *c = XCODING_SYSTEM (obj);
+ − 535 print_internal (c->methods->type, printcharfun, 1);
+ − 536 MAYBE_CODESYSMETH (c, print, (obj, printcharfun, 1));
+ − 537 if (CODING_SYSTEM_EOL_TYPE (c) != EOL_AUTODETECT)
+ − 538 write_fmt_string_lisp (printcharfun, " eol-type=%s",
+ − 539 1, Fcoding_system_property (obj, Qeol_type));
+ − 540 }
+ − 541
+ − 542 static void
428
+ − 543 print_coding_system (Lisp_Object obj, Lisp_Object printcharfun,
+ − 544 int escapeflag)
+ − 545 {
+ − 546 Lisp_Coding_System *c = XCODING_SYSTEM (obj);
+ − 547 if (print_readably)
771
+ − 548 printing_unreadable_object
+ − 549 ("printing unreadable object #<coding-system 0x%x>", c->header.uid);
+ − 550
+ − 551 write_fmt_string_lisp (printcharfun, "#<coding-system %s ", 1, c->name);
+ − 552 print_coding_system_properties (obj, printcharfun);
826
+ − 553 write_c_string (printcharfun, ">");
428
+ − 554 }
+ − 555
771
+ − 556 /* Print an abbreviated version of a coding system (but still containing
+ − 557 all the information), for use within a coding system print method. */
+ − 558
+ − 559 static void
+ − 560 print_coding_system_in_print_method (Lisp_Object cs, Lisp_Object printcharfun,
+ − 561 int escapeflag)
+ − 562 {
800
+ − 563 write_fmt_string_lisp (printcharfun, "%s[", 1, XCODING_SYSTEM_NAME (cs));
771
+ − 564 print_coding_system_properties (cs, printcharfun);
826
+ − 565 write_c_string (printcharfun, "]");
771
+ − 566 }
+ − 567
428
+ − 568 static void
+ − 569 finalize_coding_system (void *header, int for_disksave)
+ − 570 {
771
+ − 571 Lisp_Object cs = wrap_coding_system ((Lisp_Coding_System *) header);
428
+ − 572 /* Since coding systems never go away, this function is not
+ − 573 necessary. But it would be necessary if we changed things
+ − 574 so that coding systems could go away. */
+ − 575 if (!for_disksave) /* see comment in lstream.c */
771
+ − 576 MAYBE_XCODESYSMETH (cs, finalize, (cs));
+ − 577 }
+ − 578
+ − 579 static Bytecount
+ − 580 sizeof_coding_system (const void *header)
+ − 581 {
+ − 582 const Lisp_Coding_System *p = (const Lisp_Coding_System *) header;
+ − 583 return offsetof (Lisp_Coding_System, data) + p->methods->extra_data_size;
428
+ − 584 }
+ − 585
771
+ − 586 static const struct lrecord_description coding_system_methods_description_1[]
+ − 587 = {
+ − 588 { XD_LISP_OBJECT,
+ − 589 offsetof (struct coding_system_methods, type) },
+ − 590 { XD_LISP_OBJECT,
+ − 591 offsetof (struct coding_system_methods, predicate_symbol) },
+ − 592 { XD_END }
+ − 593 };
+ − 594
+ − 595 const struct struct_description coding_system_methods_description = {
+ − 596 sizeof (struct coding_system_methods),
+ − 597 coding_system_methods_description_1
+ − 598 };
+ − 599
+ − 600 const struct lrecord_description coding_system_empty_extra_description[] = {
+ − 601 { XD_END }
+ − 602 };
+ − 603
+ − 604 static const struct lrecord_description coding_system_description[] =
428
+ − 605 {
771
+ − 606 { XD_STRUCT_PTR, offsetof (Lisp_Coding_System, methods), 1,
+ − 607 &coding_system_methods_description },
+ − 608 { XD_LISP_OBJECT, offsetof (Lisp_Coding_System, name) },
+ − 609 { XD_LISP_OBJECT, offsetof (Lisp_Coding_System, description) },
+ − 610 { XD_LISP_OBJECT, offsetof (Lisp_Coding_System, mnemonic) },
+ − 611 { XD_LISP_OBJECT, offsetof (Lisp_Coding_System, documentation) },
+ − 612 { XD_LISP_OBJECT, offsetof (Lisp_Coding_System, post_read_conversion) },
+ − 613 { XD_LISP_OBJECT, offsetof (Lisp_Coding_System, pre_write_conversion) },
+ − 614 { XD_LISP_OBJECT, offsetof (Lisp_Coding_System, text_file_wrapper) },
+ − 615 { XD_LISP_OBJECT, offsetof (Lisp_Coding_System, auto_eol_wrapper) },
+ − 616 { XD_LISP_OBJECT, offsetof (Lisp_Coding_System, eol[0]) },
+ − 617 { XD_LISP_OBJECT, offsetof (Lisp_Coding_System, eol[1]) },
+ − 618 { XD_LISP_OBJECT, offsetof (Lisp_Coding_System, eol[2]) },
+ − 619 { XD_LISP_OBJECT, offsetof (Lisp_Coding_System, subsidiary_parent) },
+ − 620 { XD_LISP_OBJECT, offsetof (Lisp_Coding_System, canonical) },
+ − 621 { XD_CODING_SYSTEM_END }
+ − 622 };
+ − 623
934
+ − 624 #ifdef USE_KKCC
+ − 625 DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION ("coding-system", coding_system,
+ − 626 1, /*dumpable-flag*/
+ − 627 mark_coding_system,
+ − 628 print_coding_system,
+ − 629 finalize_coding_system,
+ − 630 0, 0, coding_system_description,
+ − 631 sizeof_coding_system,
+ − 632 Lisp_Coding_System);
+ − 633 #else /* not USE_KKCC */
771
+ − 634 DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION ("coding-system", coding_system,
+ − 635 mark_coding_system,
+ − 636 print_coding_system,
+ − 637 finalize_coding_system,
+ − 638 0, 0, coding_system_description,
+ − 639 sizeof_coding_system,
+ − 640 Lisp_Coding_System);
934
+ − 641 #endif /* not USE_KKCC */
771
+ − 642
+ − 643 /************************************************************************/
+ − 644 /* Creating coding systems */
+ − 645 /************************************************************************/
+ − 646
+ − 647 static struct coding_system_methods *
+ − 648 decode_coding_system_type (Lisp_Object type, Error_Behavior errb)
428
+ − 649 {
771
+ − 650 int i;
+ − 651
+ − 652 for (i = 0; i < Dynarr_length (the_coding_system_type_entry_dynarr); i++)
428
+ − 653 {
771
+ − 654 if (EQ (type,
+ − 655 Dynarr_at (the_coding_system_type_entry_dynarr, i).meths->type))
+ − 656 return Dynarr_at (the_coding_system_type_entry_dynarr, i).meths;
428
+ − 657 }
771
+ − 658
+ − 659 maybe_invalid_constant ("Invalid coding system type", type,
+ − 660 Qcoding_system, errb);
+ − 661
+ − 662 return 0;
428
+ − 663 }
+ − 664
771
+ − 665 static int
+ − 666 valid_coding_system_type_p (Lisp_Object type)
428
+ − 667 {
771
+ − 668 return decode_coding_system_type (type, ERROR_ME_NOT) != 0;
+ − 669 }
+ − 670
+ − 671 DEFUN ("valid-coding-system-type-p", Fvalid_coding_system_type_p, 1, 1, 0, /*
+ − 672 Given a CODING-SYSTEM-TYPE, return non-nil if it is valid.
+ − 673 Valid types depend on how XEmacs was compiled but may include
+ − 674 'undecided, 'chain, 'integer, 'ccl, 'iso2022, 'big5, 'shift-jis,
+ − 675 'utf-16, 'ucs-4, 'utf-8, etc.
+ − 676 */
+ − 677 (coding_system_type))
+ − 678 {
+ − 679 return valid_coding_system_type_p (coding_system_type) ? Qt : Qnil;
+ − 680 }
+ − 681
+ − 682 DEFUN ("coding-system-type-list", Fcoding_system_type_list, 0, 0, 0, /*
+ − 683 Return a list of valid coding system types.
+ − 684 */
+ − 685 ())
+ − 686 {
+ − 687 return Fcopy_sequence (Vcoding_system_type_list);
+ − 688 }
+ − 689
+ − 690 void
+ − 691 add_entry_to_coding_system_type_list (struct coding_system_methods *meths)
+ − 692 {
+ − 693 struct coding_system_type_entry entry;
+ − 694
+ − 695 entry.meths = meths;
+ − 696 Dynarr_add (the_coding_system_type_entry_dynarr, entry);
+ − 697 Vcoding_system_type_list = Fcons (meths->type, Vcoding_system_type_list);
428
+ − 698 }
+ − 699
+ − 700 DEFUN ("coding-system-p", Fcoding_system_p, 1, 1, 0, /*
+ − 701 Return t if OBJECT is a coding system.
+ − 702 A coding system is an object that defines how text containing multiple
+ − 703 character sets is encoded into a stream of (typically 8-bit) bytes.
+ − 704 The coding system is used to decode the stream into a series of
+ − 705 characters (which may be from multiple charsets) when the text is read
+ − 706 from a file or process, and is used to encode the text back into the
+ − 707 same format when it is written out to a file or process.
+ − 708
+ − 709 For example, many ISO2022-compliant coding systems (such as Compound
+ − 710 Text, which is used for inter-client data under the X Window System)
+ − 711 use escape sequences to switch between different charsets -- Japanese
+ − 712 Kanji, for example, is invoked with "ESC $ ( B"; ASCII is invoked
+ − 713 with "ESC ( B"; and Cyrillic is invoked with "ESC - L". See
+ − 714 `make-coding-system' for more information.
+ − 715
+ − 716 Coding systems are normally identified using a symbol, and the
+ − 717 symbol is accepted in place of the actual coding system object whenever
+ − 718 a coding system is called for. (This is similar to how faces work.)
+ − 719 */
+ − 720 (object))
+ − 721 {
+ − 722 return CODING_SYSTEMP (object) ? Qt : Qnil;
+ − 723 }
+ − 724
+ − 725 DEFUN ("find-coding-system", Ffind_coding_system, 1, 1, 0, /*
+ − 726 Retrieve the coding system of the given name.
+ − 727
+ − 728 If CODING-SYSTEM-OR-NAME is a coding-system object, it is simply
+ − 729 returned. Otherwise, CODING-SYSTEM-OR-NAME should be a symbol.
+ − 730 If there is no such coding system, nil is returned. Otherwise the
+ − 731 associated coding system object is returned.
+ − 732 */
+ − 733 (coding_system_or_name))
+ − 734 {
+ − 735 if (NILP (coding_system_or_name))
+ − 736 coding_system_or_name = Qbinary;
440
+ − 737 else if (CODING_SYSTEMP (coding_system_or_name))
+ − 738 return coding_system_or_name;
428
+ − 739 else
+ − 740 CHECK_SYMBOL (coding_system_or_name);
+ − 741
440
+ − 742 while (1)
+ − 743 {
+ − 744 coding_system_or_name =
+ − 745 Fgethash (coding_system_or_name, Vcoding_system_hash_table, Qnil);
+ − 746
771
+ − 747 if (CODING_SYSTEMP (coding_system_or_name)
+ − 748 || NILP (coding_system_or_name))
440
+ − 749 return coding_system_or_name;
+ − 750 }
428
+ − 751 }
+ − 752
+ − 753 DEFUN ("get-coding-system", Fget_coding_system, 1, 1, 0, /*
+ − 754 Retrieve the coding system of the given name.
+ − 755 Same as `find-coding-system' except that if there is no such
+ − 756 coding system, an error is signaled instead of returning nil.
+ − 757 */
+ − 758 (name))
+ − 759 {
+ − 760 Lisp_Object coding_system = Ffind_coding_system (name);
+ − 761
+ − 762 if (NILP (coding_system))
563
+ − 763 invalid_argument ("No such coding system", name);
428
+ − 764 return coding_system;
+ − 765 }
+ − 766
771
+ − 767 int
+ − 768 coding_system_is_binary (Lisp_Object coding_system)
+ − 769 {
+ − 770 Lisp_Coding_System *cs = XCODING_SYSTEM (coding_system);
+ − 771 return
+ − 772 (EQ (CODING_SYSTEM_TYPE (cs), Qno_conversion) &&
+ − 773 CODING_SYSTEM_EOL_TYPE (cs) == EOL_LF &&
+ − 774 EQ (CODING_SYSTEM_POST_READ_CONVERSION (cs), Qnil) &&
+ − 775 EQ (CODING_SYSTEM_PRE_WRITE_CONVERSION (cs), Qnil));
+ − 776 }
+ − 777
+ − 778 static Lisp_Object
+ − 779 coding_system_real_canonical (Lisp_Object cs)
+ − 780 {
+ − 781 if (!NILP (XCODING_SYSTEM_CANONICAL (cs)))
+ − 782 return XCODING_SYSTEM_CANONICAL (cs);
+ − 783 return cs;
+ − 784 }
+ − 785
+ − 786 /* Return true if coding system is of the "standard" type that decodes
+ − 787 bytes into characters (suitable for decoding a text file). */
+ − 788 int
+ − 789 coding_system_is_for_text_file (Lisp_Object coding_system)
+ − 790 {
+ − 791 return (XCODESYSMETH_OR_GIVEN
+ − 792 (coding_system, conversion_end_type,
+ − 793 (coding_system_real_canonical (coding_system)),
+ − 794 DECODES_BYTE_TO_CHARACTER) ==
+ − 795 DECODES_BYTE_TO_CHARACTER);
+ − 796 }
+ − 797
+ − 798 static int
+ − 799 decoding_source_sink_type_is_char (Lisp_Object cs, enum source_or_sink sex)
+ − 800 {
+ − 801 enum source_sink_type type =
+ − 802 XCODESYSMETH_OR_GIVEN (cs, conversion_end_type,
+ − 803 (coding_system_real_canonical (cs)),
+ − 804 DECODES_BYTE_TO_CHARACTER);
+ − 805 if (sex == CODING_SOURCE)
+ − 806 return (type == DECODES_CHARACTER_TO_CHARACTER ||
+ − 807 type == DECODES_CHARACTER_TO_BYTE);
+ − 808 else
+ − 809 return (type == DECODES_CHARACTER_TO_CHARACTER ||
+ − 810 type == DECODES_BYTE_TO_CHARACTER);
+ − 811 }
+ − 812
+ − 813 static int
+ − 814 encoding_source_sink_type_is_char (Lisp_Object cs, enum source_or_sink sex)
+ − 815 {
+ − 816 return decoding_source_sink_type_is_char (cs,
+ − 817 /* Sex change */
+ − 818 sex == CODING_SOURCE ?
+ − 819 CODING_SINK : CODING_SOURCE);
+ − 820 }
+ − 821
+ − 822 /* Like Ffind_coding_system() but check that the coding system is of the
+ − 823 "standard" type that decodes bytes into characters (suitable for
+ − 824 decoding a text file), and if not, returns an appropriate wrapper that
+ − 825 does. Also, if EOL_WRAP is non-zero, check whether this coding system
+ − 826 wants EOL auto-detection, and if so, wrap with a convert-eol coding
+ − 827 system to do this. */
+ − 828
+ − 829 Lisp_Object
+ − 830 find_coding_system_for_text_file (Lisp_Object name, int eol_wrap)
+ − 831 {
+ − 832 Lisp_Object coding_system = Ffind_coding_system (name);
+ − 833 Lisp_Object wrapper = coding_system;
+ − 834
+ − 835 if (NILP (coding_system))
+ − 836 return Qnil;
+ − 837 if (!coding_system_is_for_text_file (coding_system))
+ − 838 {
+ − 839 wrapper = XCODING_SYSTEM_TEXT_FILE_WRAPPER (coding_system);
+ − 840 if (NILP (wrapper))
+ − 841 {
+ − 842 Lisp_Object chain;
+ − 843 if (!decoding_source_sink_type_is_char (coding_system, CODING_SINK))
+ − 844 chain = list2 (coding_system, Qbinary);
+ − 845 else
+ − 846 chain = list1 (coding_system);
+ − 847 if (decoding_source_sink_type_is_char (coding_system, CODING_SOURCE))
+ − 848 chain = Fcons (Qbinary, chain);
+ − 849 wrapper =
+ − 850 make_internal_coding_system
+ − 851 (coding_system,
+ − 852 "internal-text-file-wrapper",
+ − 853 Qchain,
+ − 854 Qunbound, list4 (Qchain, chain,
+ − 855 Qcanonicalize_after_coding, coding_system));
+ − 856 XCODING_SYSTEM_TEXT_FILE_WRAPPER (coding_system) = wrapper;
+ − 857 }
+ − 858 }
+ − 859
+ − 860 if (!eol_wrap || XCODING_SYSTEM_EOL_TYPE (coding_system) != EOL_AUTODETECT)
+ − 861 return wrapper;
+ − 862
+ − 863 coding_system = wrapper;
+ − 864 wrapper = XCODING_SYSTEM_AUTO_EOL_WRAPPER (coding_system);
+ − 865 if (!NILP (wrapper))
+ − 866 return wrapper;
+ − 867 wrapper =
+ − 868 make_internal_coding_system
+ − 869 (coding_system,
+ − 870 "internal-auto-eol-wrapper",
+ − 871 Qundecided, Qunbound,
+ − 872 list4 (Qcoding_system, coding_system,
+ − 873 Qdo_eol, Qt));
+ − 874 XCODING_SYSTEM_AUTO_EOL_WRAPPER (coding_system) = wrapper;
+ − 875 return wrapper;
+ − 876 }
+ − 877
+ − 878 /* Like Fget_coding_system() but verify that the coding system is of the
+ − 879 "standard" type that decodes bytes into characters (suitable for
+ − 880 decoding a text file), and if not, returns an appropriate wrapper that
+ − 881 does. Also, if EOL_WRAP is non-zero, check whether this coding system
+ − 882 wants EOL auto-detection, and if so, wrap with a convert-eol coding
+ − 883 system to do this. */
+ − 884
+ − 885 Lisp_Object
+ − 886 get_coding_system_for_text_file (Lisp_Object name, int eol_wrap)
+ − 887 {
+ − 888 Lisp_Object coding_system = find_coding_system_for_text_file (name,
+ − 889 eol_wrap);
+ − 890 if (NILP (coding_system))
+ − 891 invalid_argument ("No such coding system", name);
+ − 892 return coding_system;
+ − 893 }
+ − 894
+ − 895 /* We store the coding systems in hash tables with the names as the
+ − 896 key and the actual coding system object as the value. Occasionally
+ − 897 we need to use them in a list format. These routines provide us
+ − 898 with that. */
428
+ − 899 struct coding_system_list_closure
+ − 900 {
+ − 901 Lisp_Object *coding_system_list;
771
+ − 902 int normal;
+ − 903 int internal;
428
+ − 904 };
+ − 905
+ − 906 static int
+ − 907 add_coding_system_to_list_mapper (Lisp_Object key, Lisp_Object value,
+ − 908 void *coding_system_list_closure)
+ − 909 {
+ − 910 /* This function can GC */
+ − 911 struct coding_system_list_closure *cscl =
+ − 912 (struct coding_system_list_closure *) coding_system_list_closure;
+ − 913 Lisp_Object *coding_system_list = cscl->coding_system_list;
+ − 914
771
+ − 915 /* We can't just use VALUE because KEY might be an alias, and we need
+ − 916 the real coding system object. */
+ − 917 if (XCODING_SYSTEM (Ffind_coding_system (key))->internal_p ?
+ − 918 cscl->internal : cscl->normal)
+ − 919 *coding_system_list = Fcons (key, *coding_system_list);
428
+ − 920 return 0;
+ − 921 }
+ − 922
771
+ − 923 DEFUN ("coding-system-list", Fcoding_system_list, 0, 1, 0, /*
428
+ − 924 Return a list of the names of all defined coding systems.
771
+ − 925 If INTERNAL is nil, only the normal (non-internal) coding systems are
+ − 926 included. (Internal coding systems are created for various internal
+ − 927 purposes, such as implementing EOL types of CRLF and CR; generally, you do
+ − 928 not want to see these.) If it is t, only the internal coding systems are
+ − 929 included. If it is any other non-nil value both normal and internal are
+ − 930 included.
428
+ − 931 */
771
+ − 932 (internal))
428
+ − 933 {
+ − 934 Lisp_Object coding_system_list = Qnil;
+ − 935 struct gcpro gcpro1;
+ − 936 struct coding_system_list_closure coding_system_list_closure;
+ − 937
+ − 938 GCPRO1 (coding_system_list);
+ − 939 coding_system_list_closure.coding_system_list = &coding_system_list;
771
+ − 940 coding_system_list_closure.normal = !EQ (internal, Qt);
+ − 941 coding_system_list_closure.internal = !NILP (internal);
428
+ − 942 elisp_maphash (add_coding_system_to_list_mapper, Vcoding_system_hash_table,
+ − 943 &coding_system_list_closure);
+ − 944 UNGCPRO;
+ − 945
+ − 946 return coding_system_list;
+ − 947 }
+ − 948
+ − 949 DEFUN ("coding-system-name", Fcoding_system_name, 1, 1, 0, /*
+ − 950 Return the name of the given coding system.
+ − 951 */
+ − 952 (coding_system))
+ − 953 {
+ − 954 coding_system = Fget_coding_system (coding_system);
+ − 955 return XCODING_SYSTEM_NAME (coding_system);
+ − 956 }
+ − 957
+ − 958 static Lisp_Coding_System *
771
+ − 959 allocate_coding_system (struct coding_system_methods *codesys_meths,
+ − 960 Bytecount data_size,
+ − 961 Lisp_Object name)
428
+ − 962 {
771
+ − 963 Bytecount total_size = offsetof (Lisp_Coding_System, data) + data_size;
428
+ − 964 Lisp_Coding_System *codesys =
771
+ − 965 (Lisp_Coding_System *) alloc_lcrecord (total_size, &lrecord_coding_system);
+ − 966
+ − 967 zero_sized_lcrecord (codesys, total_size);
+ − 968 codesys->methods = codesys_meths;
428
+ − 969 CODING_SYSTEM_PRE_WRITE_CONVERSION (codesys) = Qnil;
+ − 970 CODING_SYSTEM_POST_READ_CONVERSION (codesys) = Qnil;
771
+ − 971 CODING_SYSTEM_EOL_TYPE (codesys) = EOL_LF;
428
+ − 972 CODING_SYSTEM_EOL_CRLF (codesys) = Qnil;
+ − 973 CODING_SYSTEM_EOL_CR (codesys) = Qnil;
+ − 974 CODING_SYSTEM_EOL_LF (codesys) = Qnil;
771
+ − 975 CODING_SYSTEM_SUBSIDIARY_PARENT (codesys) = Qnil;
+ − 976 CODING_SYSTEM_CANONICAL (codesys) = Qnil;
428
+ − 977 CODING_SYSTEM_MNEMONIC (codesys) = Qnil;
771
+ − 978 CODING_SYSTEM_DOCUMENTATION (codesys) = Qnil;
+ − 979 CODING_SYSTEM_TEXT_FILE_WRAPPER (codesys) = Qnil;
+ − 980 CODING_SYSTEM_AUTO_EOL_WRAPPER (codesys) = Qnil;
+ − 981 CODING_SYSTEM_NAME (codesys) = name;
+ − 982
+ − 983 MAYBE_CODESYSMETH (codesys, init, (wrap_coding_system (codesys)));
428
+ − 984
+ − 985 return codesys;
+ − 986 }
+ − 987
771
+ − 988 static enum eol_type
+ − 989 symbol_to_eol_type (Lisp_Object symbol)
+ − 990 {
+ − 991 CHECK_SYMBOL (symbol);
+ − 992 if (NILP (symbol)) return EOL_AUTODETECT;
+ − 993 if (EQ (symbol, Qlf)) return EOL_LF;
+ − 994 if (EQ (symbol, Qcrlf)) return EOL_CRLF;
+ − 995 if (EQ (symbol, Qcr)) return EOL_CR;
+ − 996
+ − 997 invalid_constant ("Unrecognized eol type", symbol);
801
+ − 998 RETURN_NOT_REACHED (EOL_AUTODETECT)
771
+ − 999 }
+ − 1000
+ − 1001 static Lisp_Object
+ − 1002 eol_type_to_symbol (enum eol_type type)
+ − 1003 {
+ − 1004 switch (type)
+ − 1005 {
+ − 1006 default: abort ();
+ − 1007 case EOL_LF: return Qlf;
+ − 1008 case EOL_CRLF: return Qcrlf;
+ − 1009 case EOL_CR: return Qcr;
+ − 1010 case EOL_AUTODETECT: return Qnil;
+ − 1011 }
+ − 1012 }
+ − 1013
+ − 1014 struct subsidiary_type
+ − 1015 {
+ − 1016 Char_ASCII *extension;
+ − 1017 Char_ASCII *mnemonic_ext;
+ − 1018 enum eol_type eol;
+ − 1019 };
+ − 1020
+ − 1021 static struct subsidiary_type coding_subsidiary_list[] =
+ − 1022 { { "-unix", "", EOL_LF },
+ − 1023 { "-dos", ":T", EOL_CRLF },
+ − 1024 { "-mac", ":t", EOL_CR } };
+ − 1025
+ − 1026 /* kludge */
428
+ − 1027 static void
771
+ − 1028 setup_eol_coding_systems (Lisp_Object codesys)
428
+ − 1029 {
793
+ − 1030 int len = XSTRING_LENGTH (XSYMBOL (XCODING_SYSTEM_NAME (codesys))->name);
867
+ − 1031 Ibyte *codesys_name = (Ibyte *) ALLOCA (len + 7);
771
+ − 1032 int mlen = -1;
867
+ − 1033 Ibyte *codesys_mnemonic = 0;
771
+ − 1034 Lisp_Object codesys_name_sym, sub_codesys;
+ − 1035 int i;
+ − 1036
+ − 1037 memcpy (codesys_name,
793
+ − 1038 XSTRING_DATA (XSYMBOL (XCODING_SYSTEM_NAME (codesys))->name), len);
771
+ − 1039
+ − 1040 if (STRINGP (XCODING_SYSTEM_MNEMONIC (codesys)))
428
+ − 1041 {
771
+ − 1042 mlen = XSTRING_LENGTH (XCODING_SYSTEM_MNEMONIC (codesys));
867
+ − 1043 codesys_mnemonic = (Ibyte *) ALLOCA (mlen + 7);
771
+ − 1044 memcpy (codesys_mnemonic,
+ − 1045 XSTRING_DATA (XCODING_SYSTEM_MNEMONIC (codesys)), mlen);
+ − 1046 }
+ − 1047
+ − 1048 /* Create three "subsidiary" coding systems, decoding data encoded using
+ − 1049 each of the three EOL types. We do this for each subsidiary by
+ − 1050 copying the original coding system, setting the EOL type
+ − 1051 appropriately, and setting the CANONICAL member of the new coding
+ − 1052 system to be a chain consisting of the original coding system followed
+ − 1053 by a convert-eol coding system to do the EOL decoding. For EOL type
+ − 1054 LF, however, we don't need any decoding, so we skip creating a
+ − 1055 CANONICAL.
+ − 1056
+ − 1057 If the original coding system is not a text-type coding system
+ − 1058 (decodes byte->char), we need to coerce it to one by the appropriate
+ − 1059 wrapping in CANONICAL. */
+ − 1060
+ − 1061 for (i = 0; i < countof (coding_subsidiary_list); i++)
+ − 1062 {
+ − 1063 Char_ASCII *extension = coding_subsidiary_list[i].extension;
+ − 1064 Char_ASCII *mnemonic_ext = coding_subsidiary_list[i].mnemonic_ext;
+ − 1065 enum eol_type eol = coding_subsidiary_list[i].eol;
+ − 1066
+ − 1067 qxestrcpy_c (codesys_name + len, extension);
+ − 1068 codesys_name_sym = intern_int (codesys_name);
+ − 1069 if (mlen != -1)
+ − 1070 qxestrcpy_c (codesys_mnemonic + mlen, mnemonic_ext);
+ − 1071
+ − 1072 sub_codesys = Fcopy_coding_system (codesys, codesys_name_sym);
+ − 1073 if (mlen != -1)
+ − 1074 XCODING_SYSTEM_MNEMONIC (sub_codesys) =
+ − 1075 build_intstring (codesys_mnemonic);
+ − 1076
+ − 1077 if (eol != EOL_LF)
+ − 1078 {
+ − 1079 Lisp_Object chain = list2 (get_coding_system_for_text_file
+ − 1080 (codesys, 0),
+ − 1081 eol == EOL_CR ? Qconvert_eol_cr :
+ − 1082 Qconvert_eol_crlf);
+ − 1083 Lisp_Object canon =
+ − 1084 make_internal_coding_system
+ − 1085 (sub_codesys, "internal-subsidiary-eol-wrapper",
+ − 1086 Qchain, Qunbound,
+ − 1087 mlen != -1 ?
+ − 1088 list6 (Qmnemonic, build_intstring (codesys_mnemonic),
+ − 1089 Qchain, chain,
+ − 1090 Qcanonicalize_after_coding, sub_codesys) :
+ − 1091 list4 (Qchain, chain,
+ − 1092 Qcanonicalize_after_coding, sub_codesys));
+ − 1093 XCODING_SYSTEM_CANONICAL (sub_codesys) = canon;
+ − 1094 }
+ − 1095 XCODING_SYSTEM_EOL_TYPE (sub_codesys) = eol;
+ − 1096 XCODING_SYSTEM_SUBSIDIARY_PARENT (sub_codesys) = codesys;
+ − 1097 XCODING_SYSTEM (codesys)->eol[eol] = sub_codesys;
428
+ − 1098 }
+ − 1099 }
+ − 1100
771
+ − 1101 /* Basic function to create new coding systems. For `make-coding-system',
+ − 1102 NAME-OR-EXISTING is the NAME argument, PREFIX is null, and TYPE,
+ − 1103 DESCRIPTION, and PROPS are the same. All created coding systems are put
+ − 1104 in a hash table indexed by NAME.
+ − 1105
+ − 1106 If PREFIX is a string, NAME-OR-EXISTING should specify an existing
+ − 1107 coding system (or nil), and an internal coding system will be created.
+ − 1108 The name of the coding system will be constructed by combining PREFIX
+ − 1109 with the name of the existing coding system (if given), and a number
+ − 1110 will be appended to insure uniqueness. In such a case, if Qunbound is
+ − 1111 given for DESCRIPTION, the description gets created based on the
+ − 1112 generated name. Also, if no mnemonic is given in the properties list, a
+ − 1113 mnemonic is created based on the generated name.
+ − 1114
+ − 1115 For internal coding systems, the coding system is marked as internal
+ − 1116 (see `coding-system-list'), and no subsidiaries will be created or
+ − 1117 eol-wrapping will happen. Otherwise:
+ − 1118
+ − 1119 -- if the eol-type property is `lf' or t, the coding system is merely
+ − 1120 created and returned. (For t, the coding system will be wrapped with
+ − 1121 an EOL autodetector when it's used to read a file.)
+ − 1122
+ − 1123 -- if eol-type is `crlf' or `cr', after the coding system object is
+ − 1124 created, it will be wrapped in a chain with the appropriate
+ − 1125 convert-eol coding system (either `convert-eol-crlf' or
+ − 1126 `convert-eol-cr'), so that CRLF->LF or CR->LF conversion is done at
+ − 1127 decoding time, and the opposite at encoding time. The resulting
+ − 1128 chain becomes the CANONICAL field of the coding system object.
+ − 1129
+ − 1130 -- if eol-type is nil or omitted, "subsidiaries" are generated: Three
+ − 1131 coding systems where the original coding system (before wrapping with
+ − 1132 convert-eol-autodetect) is either unwrapped or wrapped with
+ − 1133 convert-eol-crlf or convert-eol-cr, respectively, so that coding systems
+ − 1134 to handle LF, CRLF, and CR end-of-line indicators are created. (This
+ − 1135 crazy crap is based on existing behavior in other Mule versions,
+ − 1136 including FSF Emacs.)
+ − 1137 */
428
+ − 1138
+ − 1139 static Lisp_Object
771
+ − 1140 make_coding_system_1 (Lisp_Object name_or_existing, Char_ASCII *prefix,
+ − 1141 Lisp_Object type, Lisp_Object description,
+ − 1142 Lisp_Object props)
428
+ − 1143 {
771
+ − 1144 Lisp_Coding_System *cs;
+ − 1145 int need_to_setup_eol_systems = 1;
+ − 1146 enum eol_type eol_wrapper = EOL_AUTODETECT;
+ − 1147 struct coding_system_methods *meths;
+ − 1148 Lisp_Object csobj;
+ − 1149 Lisp_Object defmnem = Qnil;
+ − 1150
+ − 1151 if (NILP (type))
+ − 1152 type = Qundecided;
+ − 1153 meths = decode_coding_system_type (type, ERROR_ME);
+ − 1154
+ − 1155 if (prefix)
428
+ − 1156 {
867
+ − 1157 Ibyte *newname =
771
+ − 1158 emacs_sprintf_malloc (NULL, "%s-%s-%d",
+ − 1159 prefix,
867
+ − 1160 NILP (name_or_existing) ? (Ibyte *) "nil" :
771
+ − 1161 XSTRING_DATA (Fsymbol_name (XCODING_SYSTEM_NAME
+ − 1162 (name_or_existing))),
+ − 1163 ++coding_system_tick);
+ − 1164 name_or_existing = intern_int (newname);
+ − 1165 xfree (newname);
+ − 1166
+ − 1167 if (UNBOUNDP (description))
+ − 1168 {
+ − 1169 newname =
+ − 1170 emacs_sprintf_malloc
+ − 1171 (NULL, "For Internal Use (%s)",
+ − 1172 XSTRING_DATA (Fsymbol_name (name_or_existing)));
+ − 1173 description = build_intstring (newname);
+ − 1174 xfree (newname);
+ − 1175 }
+ − 1176
+ − 1177 newname = emacs_sprintf_malloc (NULL, "Int%d", coding_system_tick);
+ − 1178 defmnem = build_intstring (newname);
945
+ − 1179 xfree (newname);
428
+ − 1180 }
771
+ − 1181 else
+ − 1182 CHECK_SYMBOL (name_or_existing);
+ − 1183
+ − 1184 if (!NILP (Ffind_coding_system (name_or_existing)))
+ − 1185 invalid_operation ("Cannot redefine existing coding system",
+ − 1186 name_or_existing);
+ − 1187
+ − 1188 cs = allocate_coding_system (meths, meths->extra_data_size,
+ − 1189 name_or_existing);
793
+ − 1190 csobj = wrap_coding_system (cs);
771
+ − 1191
+ − 1192 cs->internal_p = !!prefix;
+ − 1193
+ − 1194 if (NILP (description))
+ − 1195 description = build_string ("");
+ − 1196 else
+ − 1197 CHECK_STRING (description);
+ − 1198 CODING_SYSTEM_DESCRIPTION (cs) = description;
+ − 1199
+ − 1200 if (!NILP (defmnem))
+ − 1201 CODING_SYSTEM_MNEMONIC (cs) = defmnem;
+ − 1202
+ − 1203 {
+ − 1204 EXTERNAL_PROPERTY_LIST_LOOP_3 (key, value, props)
+ − 1205 {
+ − 1206 int recognized = 1;
+ − 1207
+ − 1208 if (EQ (key, Qmnemonic))
+ − 1209 {
+ − 1210 if (!NILP (value))
+ − 1211 CHECK_STRING (value);
+ − 1212 CODING_SYSTEM_MNEMONIC (cs) = value;
+ − 1213 }
+ − 1214
+ − 1215 else if (EQ (key, Qdocumentation))
+ − 1216 {
+ − 1217 if (!NILP (value))
+ − 1218 CHECK_STRING (value);
+ − 1219 CODING_SYSTEM_DOCUMENTATION (cs) = value;
+ − 1220 }
+ − 1221
+ − 1222 else if (EQ (key, Qeol_type))
+ − 1223 {
+ − 1224 need_to_setup_eol_systems = NILP (value);
+ − 1225 if (EQ (value, Qt))
+ − 1226 value = Qnil;
+ − 1227 eol_wrapper = symbol_to_eol_type (value);
+ − 1228 }
+ − 1229
+ − 1230 else if (EQ (key, Qpost_read_conversion))
+ − 1231 CODING_SYSTEM_POST_READ_CONVERSION (cs) = value;
+ − 1232 else if (EQ (key, Qpre_write_conversion))
+ − 1233 CODING_SYSTEM_PRE_WRITE_CONVERSION (cs) = value;
+ − 1234 /* FSF compatibility */
+ − 1235 else if (EQ (key, Qtranslation_table_for_decode))
+ − 1236 ;
+ − 1237 else if (EQ (key, Qtranslation_table_for_encode))
+ − 1238 ;
+ − 1239 else if (EQ (key, Qsafe_chars))
+ − 1240 ;
+ − 1241 else if (EQ (key, Qsafe_charsets))
+ − 1242 ;
+ − 1243 else if (EQ (key, Qmime_charset))
+ − 1244 ;
+ − 1245 else if (EQ (key, Qvalid_codes))
+ − 1246 ;
+ − 1247 else
+ − 1248 recognized = CODESYSMETH_OR_GIVEN (cs, putprop,
+ − 1249 (csobj, key, value), 0);
+ − 1250
+ − 1251 if (!recognized)
+ − 1252 invalid_constant ("Unrecognized property", key);
+ − 1253 }
+ − 1254 }
+ − 1255
+ − 1256 {
+ − 1257 XCODING_SYSTEM_CANONICAL (csobj) =
+ − 1258 CODESYSMETH_OR_GIVEN (cs, canonicalize, (csobj), Qnil);
+ − 1259 XCODING_SYSTEM_EOL_TYPE (csobj) = EOL_AUTODETECT; /* for copy-coding-system
+ − 1260 below */
+ − 1261
+ − 1262 if (need_to_setup_eol_systems && !cs->internal_p)
+ − 1263 setup_eol_coding_systems (csobj);
+ − 1264 else if (eol_wrapper == EOL_CR || eol_wrapper == EOL_CRLF)
+ − 1265 {
+ − 1266 /* If a specific eol-type (other than LF) was specified, we handle
+ − 1267 this by converting the coding system into a chain that wraps the
+ − 1268 coding system along with a convert-eol system after it, in
+ − 1269 exactly that same switcheroo fashion that the normal
+ − 1270 canonicalize method works -- BUT we will run into a problem if
+ − 1271 we do it the obvious way, because when `chain' creates its
+ − 1272 substreams, the substream containing the coding system we're
+ − 1273 creating will have canonicalization expansion done on it,
+ − 1274 leading to infinite recursion. So we have to generate a new,
+ − 1275 internal coding system with the previous value of CANONICAL. */
867
+ − 1276 Ibyte *newname =
771
+ − 1277 emacs_sprintf_malloc
+ − 1278 (NULL, "internal-eol-copy-%s-%d",
+ − 1279 XSTRING_DATA (Fsymbol_name (name_or_existing)),
+ − 1280 ++coding_system_tick);
+ − 1281 Lisp_Object newnamesym = intern_int (newname);
+ − 1282 Lisp_Object copied = Fcopy_coding_system (csobj, newnamesym);
+ − 1283 xfree (newname);
+ − 1284
+ − 1285 XCODING_SYSTEM_CANONICAL (csobj) =
+ − 1286 make_internal_coding_system
+ − 1287 (csobj,
+ − 1288 "internal-eol-wrapper",
+ − 1289 Qchain, Qunbound,
+ − 1290 list4 (Qchain,
+ − 1291 list2 (copied,
+ − 1292 eol_wrapper == EOL_CR ?
+ − 1293 Qconvert_eol_cr :
+ − 1294 Qconvert_eol_crlf),
+ − 1295 Qcanonicalize_after_coding,
+ − 1296 csobj));
+ − 1297 }
+ − 1298 XCODING_SYSTEM_EOL_TYPE (csobj) = eol_wrapper;
+ − 1299 }
+ − 1300
+ − 1301 Fputhash (name_or_existing, csobj, Vcoding_system_hash_table);
+ − 1302
+ − 1303 return csobj;
428
+ − 1304 }
+ − 1305
771
+ − 1306 Lisp_Object
+ − 1307 make_internal_coding_system (Lisp_Object existing, Char_ASCII *prefix,
+ − 1308 Lisp_Object type, Lisp_Object description,
+ − 1309 Lisp_Object props)
+ − 1310 {
+ − 1311 return make_coding_system_1 (existing, prefix, type, description, props);
+ − 1312 }
428
+ − 1313
+ − 1314 DEFUN ("make-coding-system", Fmake_coding_system, 2, 4, 0, /*
+ − 1315 Register symbol NAME as a coding system.
+ − 1316
+ − 1317 TYPE describes the conversion method used and should be one of
+ − 1318
+ − 1319 nil or 'undecided
+ − 1320 Automatic conversion. XEmacs attempts to detect the coding system
+ − 1321 used in the file.
771
+ − 1322 'chain
+ − 1323 Chain two or more coding systems together to make a combination coding
+ − 1324 system.
428
+ − 1325 'no-conversion
+ − 1326 No conversion. Use this for binary files and such. On output,
+ − 1327 graphic characters that are not in ASCII or Latin-1 will be
+ − 1328 replaced by a ?. (For a no-conversion-encoded buffer, these
+ − 1329 characters will only be present if you explicitly insert them.)
771
+ − 1330 'convert-eol
+ − 1331 Convert CRLF sequences or CR to LF.
428
+ − 1332 'shift-jis
+ − 1333 Shift-JIS (a Japanese encoding commonly used in PC operating systems).
771
+ − 1334 'unicode
+ − 1335 Any Unicode encoding (UCS-4, UTF-8, UTF-16, etc.).
+ − 1336 'mswindows-unicode-to-multibyte
+ − 1337 (MS Windows only) Converts from Windows Unicode to Windows Multibyte
+ − 1338 (any code page encoding) upon encoding, and the other way upon decoding.
+ − 1339 'mswindows-multibyte
+ − 1340 Converts to or from Windows Multibyte (any code page encoding).
+ − 1341 This is resolved into a chain of `mswindows-unicode' and
+ − 1342 `mswindows-unicode-to-multibyte'.
428
+ − 1343 'iso2022
+ − 1344 Any ISO2022-compliant encoding. Among other things, this includes
+ − 1345 JIS (the Japanese encoding commonly used for e-mail), EUC (the
+ − 1346 standard Unix encoding for Japanese and other languages), and
+ − 1347 Compound Text (the encoding used in X11). You can specify more
442
+ − 1348 specific information about the conversion with the PROPS argument.
428
+ − 1349 'big5
+ − 1350 Big5 (the encoding commonly used for Taiwanese).
+ − 1351 'ccl
+ − 1352 The conversion is performed using a user-written pseudo-code
+ − 1353 program. CCL (Code Conversion Language) is the name of this
+ − 1354 pseudo-code.
771
+ − 1355 'gzip
+ − 1356 GZIP compression format.
428
+ − 1357 'internal
+ − 1358 Write out or read in the raw contents of the memory representing
+ − 1359 the buffer's text. This is primarily useful for debugging
+ − 1360 purposes, and is only enabled when XEmacs has been compiled with
+ − 1361 DEBUG_XEMACS defined (via the --debug configure option).
+ − 1362 WARNING: Reading in a file using 'internal conversion can result
+ − 1363 in an internal inconsistency in the memory representing a
+ − 1364 buffer's text, which will produce unpredictable results and may
+ − 1365 cause XEmacs to crash. Under normal circumstances you should
+ − 1366 never use 'internal conversion.
+ − 1367
771
+ − 1368 DESCRIPTION is a short English phrase describing the coding system,
+ − 1369 suitable for use as a menu item. (See also the `documentation' property
+ − 1370 below.)
428
+ − 1371
+ − 1372 PROPS is a property list, describing the specific nature of the
+ − 1373 character set. Recognized properties are:
+ − 1374
+ − 1375 'mnemonic
+ − 1376 String to be displayed in the modeline when this coding system is
+ − 1377 active.
+ − 1378
771
+ − 1379 'documentation
+ − 1380 Detailed documentation on the coding system.
+ − 1381
428
+ − 1382 'eol-type
+ − 1383 End-of-line conversion to be used. It should be one of
+ − 1384
+ − 1385 nil
+ − 1386 Automatically detect the end-of-line type (LF, CRLF,
+ − 1387 or CR). Also generate subsidiary coding systems named
+ − 1388 `NAME-unix', `NAME-dos', and `NAME-mac', that are
+ − 1389 identical to this coding system but have an EOL-TYPE
+ − 1390 value of 'lf, 'crlf, and 'cr, respectively.
+ − 1391 'lf
+ − 1392 The end of a line is marked externally using ASCII LF.
+ − 1393 Since this is also the way that XEmacs represents an
+ − 1394 end-of-line internally, specifying this option results
+ − 1395 in no end-of-line conversion. This is the standard
+ − 1396 format for Unix text files.
+ − 1397 'crlf
+ − 1398 The end of a line is marked externally using ASCII
+ − 1399 CRLF. This is the standard format for MS-DOS text
+ − 1400 files.
+ − 1401 'cr
+ − 1402 The end of a line is marked externally using ASCII CR.
+ − 1403 This is the standard format for Macintosh text files.
+ − 1404 t
+ − 1405 Automatically detect the end-of-line type but do not
+ − 1406 generate subsidiary coding systems. (This value is
+ − 1407 converted to nil when stored internally, and
+ − 1408 `coding-system-property' will return nil.)
+ − 1409
+ − 1410 'post-read-conversion
771
+ − 1411 The value is a function to call after some text is inserted and
+ − 1412 decoded by the coding system itself and before any functions in
+ − 1413 `after-change-functions' are called. (#### Not actually true in
+ − 1414 XEmacs. `after-change-functions' will be called twice if
+ − 1415 `post-read-conversion' changes something.) The argument of this
+ − 1416 function is the same as for a function in
+ − 1417 `after-insert-file-functions', i.e. LENGTH of the text inserted,
+ − 1418 with point at the head of the text to be decoded.
428
+ − 1419
+ − 1420 'pre-write-conversion
771
+ − 1421 The value is a function to call after all functions in
+ − 1422 `write-region-annotate-functions' and `buffer-file-format' are
+ − 1423 called, and before the text is encoded by the coding system itself.
+ − 1424 The arguments to this function are the same as those of a function
+ − 1425 in `write-region-annotate-functions', i.e. FROM and TO, specifying
+ − 1426 a region of text.
+ − 1427
+ − 1428
+ − 1429
+ − 1430 The following properties are allowed for FSF compatibility but currently
+ − 1431 ignored:
+ − 1432
+ − 1433 'translation-table-for-decode
+ − 1434 The value is a translation table to be applied on decoding. See
+ − 1435 the function `make-translation-table' for the format of translation
+ − 1436 table. This is not applicable to CCL-based coding systems.
+ − 1437
+ − 1438 'translation-table-for-encode
+ − 1439 The value is a translation table to be applied on encoding. This is
+ − 1440 not applicable to CCL-based coding systems.
+ − 1441
+ − 1442 'safe-chars
+ − 1443 The value is a char table. If a character has non-nil value in it,
+ − 1444 the character is safely supported by the coding system. This
+ − 1445 overrides the specification of safe-charsets.
+ − 1446
+ − 1447 'safe-charsets
+ − 1448 The value is a list of charsets safely supported by the coding
+ − 1449 system. The value t means that all charsets Emacs handles are
+ − 1450 supported. Even if some charset is not in this list, it doesn't
+ − 1451 mean that the charset can't be encoded in the coding system;
+ − 1452 it just means that some other receiver of text encoded
+ − 1453 in the coding system won't be able to handle that charset.
+ − 1454
+ − 1455 'mime-charset
+ − 1456 The value is a symbol of which name is `MIME-charset' parameter of
+ − 1457 the coding system.
+ − 1458
+ − 1459 'valid-codes (meaningful only for a coding system based on CCL)
+ − 1460 The value is a list to indicate valid byte ranges of the encoded
+ − 1461 file. Each element of the list is an integer or a cons of integer.
+ − 1462 In the former case, the integer value is a valid byte code. In the
+ − 1463 latter case, the integers specifies the range of valid byte codes.
+ − 1464
+ − 1465
+ − 1466
+ − 1467 The following additional property is recognized if TYPE is 'convert-eol:
+ − 1468
+ − 1469 'subtype
793
+ − 1470 One of `lf', `crlf', `cr' or nil (for autodetection). When decoding,
+ − 1471 the corresponding sequence will be converted to LF. When encoding,
+ − 1472 the opposite happens. This coding system converts characters to
771
+ − 1473 characters.
+ − 1474
428
+ − 1475
+ − 1476
+ − 1477 The following additional properties are recognized if TYPE is 'iso2022:
+ − 1478
+ − 1479 'charset-g0
+ − 1480 'charset-g1
+ − 1481 'charset-g2
+ − 1482 'charset-g3
+ − 1483 The character set initially designated to the G0 - G3 registers.
+ − 1484 The value should be one of
+ − 1485
+ − 1486 -- A charset object (designate that character set)
+ − 1487 -- nil (do not ever use this register)
+ − 1488 -- t (no character set is initially designated to
+ − 1489 the register, but may be later on; this automatically
+ − 1490 sets the corresponding `force-g*-on-output' property)
+ − 1491
+ − 1492 'force-g0-on-output
+ − 1493 'force-g1-on-output
+ − 1494 'force-g2-on-output
+ − 1495 'force-g2-on-output
+ − 1496 If non-nil, send an explicit designation sequence on output before
+ − 1497 using the specified register.
+ − 1498
+ − 1499 'short
+ − 1500 If non-nil, use the short forms "ESC $ @", "ESC $ A", and
+ − 1501 "ESC $ B" on output in place of the full designation sequences
+ − 1502 "ESC $ ( @", "ESC $ ( A", and "ESC $ ( B".
+ − 1503
+ − 1504 'no-ascii-eol
+ − 1505 If non-nil, don't designate ASCII to G0 at each end of line on output.
+ − 1506 Setting this to non-nil also suppresses other state-resetting that
+ − 1507 normally happens at the end of a line.
+ − 1508
+ − 1509 'no-ascii-cntl
+ − 1510 If non-nil, don't designate ASCII to G0 before control chars on output.
+ − 1511
+ − 1512 'seven
+ − 1513 If non-nil, use 7-bit environment on output. Otherwise, use 8-bit
+ − 1514 environment.
+ − 1515
+ − 1516 'lock-shift
+ − 1517 If non-nil, use locking-shift (SO/SI) instead of single-shift
+ − 1518 or designation by escape sequence.
+ − 1519
+ − 1520 'no-iso6429
+ − 1521 If non-nil, don't use ISO6429's direction specification.
+ − 1522
+ − 1523 'escape-quoted
+ − 1524 If non-nil, literal control characters that are the same as
+ − 1525 the beginning of a recognized ISO2022 or ISO6429 escape sequence
+ − 1526 (in particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E),
+ − 1527 SS3 (0x8F), and CSI (0x9B)) are "quoted" with an escape character
+ − 1528 so that they can be properly distinguished from an escape sequence.
+ − 1529 (Note that doing this results in a non-portable encoding.) This
+ − 1530 encoding flag is used for byte-compiled files. Note that ESC
+ − 1531 is a good choice for a quoting character because there are no
+ − 1532 escape sequences whose second byte is a character from the Control-0
+ − 1533 or Control-1 character sets; this is explicitly disallowed by the
+ − 1534 ISO2022 standard.
+ − 1535
+ − 1536 'input-charset-conversion
+ − 1537 A list of conversion specifications, specifying conversion of
+ − 1538 characters in one charset to another when decoding is performed.
+ − 1539 Each specification is a list of two elements: the source charset,
+ − 1540 and the destination charset.
+ − 1541
+ − 1542 'output-charset-conversion
+ − 1543 A list of conversion specifications, specifying conversion of
+ − 1544 characters in one charset to another when encoding is performed.
+ − 1545 The form of each specification is the same as for
+ − 1546 'input-charset-conversion.
+ − 1547
+ − 1548
771
+ − 1549
428
+ − 1550 The following additional properties are recognized (and required)
+ − 1551 if TYPE is 'ccl:
+ − 1552
+ − 1553 'decode
+ − 1554 CCL program used for decoding (converting to internal format).
+ − 1555
+ − 1556 'encode
+ − 1557 CCL program used for encoding (converting to external format).
771
+ − 1558
+ − 1559
+ − 1560 The following additional properties are recognized if TYPE is 'chain:
+ − 1561
+ − 1562 'chain
+ − 1563 List of coding systems to be chained together, in decoding order.
+ − 1564
+ − 1565 'canonicalize-after-coding
+ − 1566 Coding system to be returned by the detector routines in place of
+ − 1567 this coding system.
+ − 1568
+ − 1569
+ − 1570
+ − 1571 The following additional properties are recognized if TYPE is 'unicode:
+ − 1572
+ − 1573 'type
+ − 1574 One of `utf-16', `utf-8', `ucs-4', or `utf-7' (the latter is not
+ − 1575 yet implemented). `utf-16' is the basic two-byte encoding;
+ − 1576 `ucs-4' is the four-byte encoding; `utf-8' is an ASCII-compatible
+ − 1577 variable-width 8-bit encoding; `utf-7' is a 7-bit encoding using
+ − 1578 only characters that will safely pass through all mail gateways.
+ − 1579
+ − 1580 'little-endian
+ − 1581 If non-nil, `utf-16' and `ucs-4' will write out the groups of two
+ − 1582 or four bytes little-endian instead of big-endian. This is required,
+ − 1583 for example, under Windows.
+ − 1584
+ − 1585 'need-bom
+ − 1586 If non-nil, a byte order mark (BOM, or Unicode FFFE) should be
+ − 1587 written out at the beginning of the data. This serves both to
+ − 1588 identify the endianness of the following data and to mark the
+ − 1589 data as Unicode (at least, this is how Windows uses it).
+ − 1590
+ − 1591
+ − 1592
+ − 1593 The following additional properties are recognized if TYPE is
+ − 1594 'mswindows-multibyte:
+ − 1595
+ − 1596 'code-page
+ − 1597 Either a number (specifying a particular code page) or one of the
+ − 1598 symbols `ansi', `oem', `mac', or `ebcdic', specifying the ANSI,
+ − 1599 OEM, Macintosh, or EBCDIC code page associated with a particular
+ − 1600 locale (given by the `locale' property). NOTE: EBCDIC code pages
+ − 1601 only exist in Windows 2000 and later.
+ − 1602
+ − 1603 'locale
+ − 1604 If `code-page' is a symbol, this specifies the locale whose code
+ − 1605 page of the corresponding type should be used. This should be
+ − 1606 one of the following: A cons of two strings, (LANGUAGE
+ − 1607 . SUBLANGUAGE) (see `mswindows-set-current-locale'); a string (a
+ − 1608 language; SUBLANG_DEFAULT, i.e. the default sublanguage, is
+ − 1609 used); or one of the symbols `current', `user-default', or
+ − 1610 `system-default', corresponding to the values of
+ − 1611 `mswindows-current-locale', `mswindows-user-default-locale', or
+ − 1612 `mswindows-system-default-locale', respectively.
+ − 1613
+ − 1614
+ − 1615
+ − 1616 The following additional properties are recognized if TYPE is 'undecided:
+ − 1617
+ − 1618 'do-eol
+ − 1619 Do EOL detection.
+ − 1620
+ − 1621 'do-coding
+ − 1622 Do encoding detection.
+ − 1623
+ − 1624 'coding-system
+ − 1625 If encoding detection is not done, use the specified coding system
+ − 1626 to do decoding. This is used internally when implementing coding
+ − 1627 systems with an EOL type that specifies autodetection (the default),
+ − 1628 so that the detector routines return the proper subsidiary.
+ − 1629
+ − 1630
+ − 1631
+ − 1632 The following additional property is recognized if TYPE is 'gzip:
+ − 1633
+ − 1634 'level
+ − 1635 Compression level: 0 through 9, or `default' (currently 6).
+ − 1636
428
+ − 1637 */
771
+ − 1638 (name, type, description, props))
428
+ − 1639 {
771
+ − 1640 return make_coding_system_1 (name, 0, type, description, props);
428
+ − 1641 }
+ − 1642
+ − 1643 DEFUN ("copy-coding-system", Fcopy_coding_system, 2, 2, 0, /*
+ − 1644 Copy OLD-CODING-SYSTEM to NEW-NAME.
+ − 1645 If NEW-NAME does not name an existing coding system, a new one will
+ − 1646 be created.
771
+ − 1647 If you are using this function to create an alias, think again:
+ − 1648 Use `define-coding-system-alias' instead.
428
+ − 1649 */
+ − 1650 (old_coding_system, new_name))
+ − 1651 {
+ − 1652 Lisp_Object new_coding_system;
+ − 1653 old_coding_system = Fget_coding_system (old_coding_system);
771
+ − 1654 new_coding_system =
+ − 1655 UNBOUNDP (new_name) ? Qnil : Ffind_coding_system (new_name);
428
+ − 1656 if (NILP (new_coding_system))
+ − 1657 {
793
+ − 1658 new_coding_system =
+ − 1659 wrap_coding_system
+ − 1660 (allocate_coding_system
+ − 1661 (XCODING_SYSTEM (old_coding_system)->methods,
+ − 1662 XCODING_SYSTEM (old_coding_system)->methods->extra_data_size,
+ − 1663 new_name));
771
+ − 1664 if (!UNBOUNDP (new_name))
+ − 1665 Fputhash (new_name, new_coding_system, Vcoding_system_hash_table);
428
+ − 1666 }
771
+ − 1667 else if (XCODING_SYSTEM (old_coding_system)->methods !=
+ − 1668 XCODING_SYSTEM (new_coding_system)->methods)
+ − 1669 invalid_operation_2 ("Coding systems not same type",
+ − 1670 old_coding_system, new_coding_system);
428
+ − 1671
+ − 1672 {
+ − 1673 Lisp_Coding_System *to = XCODING_SYSTEM (new_coding_system);
+ − 1674 Lisp_Coding_System *from = XCODING_SYSTEM (old_coding_system);
771
+ − 1675 copy_sized_lcrecord (to, from, sizeof_coding_system (from));
428
+ − 1676 to->name = new_name;
+ − 1677 }
+ − 1678 return new_coding_system;
+ − 1679 }
+ − 1680
771
+ − 1681 DEFUN ("coding-system-canonical-name-p", Fcoding_system_canonical_name_p,
+ − 1682 1, 1, 0, /*
440
+ − 1683 Return t if OBJECT names a coding system, and is not a coding system alias.
428
+ − 1684 */
440
+ − 1685 (object))
+ − 1686 {
+ − 1687 return CODING_SYSTEMP (Fgethash (object, Vcoding_system_hash_table, Qnil))
+ − 1688 ? Qt : Qnil;
+ − 1689 }
+ − 1690
+ − 1691 DEFUN ("coding-system-alias-p", Fcoding_system_alias_p, 1, 1, 0, /*
+ − 1692 Return t if OBJECT is a coding system alias.
+ − 1693 All coding system aliases are created by `define-coding-system-alias'.
+ − 1694 */
+ − 1695 (object))
428
+ − 1696 {
440
+ − 1697 return SYMBOLP (Fgethash (object, Vcoding_system_hash_table, Qzero))
+ − 1698 ? Qt : Qnil;
+ − 1699 }
+ − 1700
+ − 1701 DEFUN ("coding-system-aliasee", Fcoding_system_aliasee, 1, 1, 0, /*
+ − 1702 Return the coding-system symbol for which symbol ALIAS is an alias.
+ − 1703 */
+ − 1704 (alias))
+ − 1705 {
+ − 1706 Lisp_Object aliasee = Fgethash (alias, Vcoding_system_hash_table, Qnil);
+ − 1707 if (SYMBOLP (aliasee))
+ − 1708 return aliasee;
+ − 1709 else
563
+ − 1710 invalid_argument ("Symbol is not a coding system alias", alias);
801
+ − 1711 RETURN_NOT_REACHED (Qnil)
440
+ − 1712 }
+ − 1713
+ − 1714 /* A maphash function, for removing dangling coding system aliases. */
+ − 1715 static int
+ − 1716 dangling_coding_system_alias_p (Lisp_Object alias,
+ − 1717 Lisp_Object aliasee,
+ − 1718 void *dangling_aliases)
+ − 1719 {
+ − 1720 if (SYMBOLP (aliasee)
+ − 1721 && NILP (Fgethash (aliasee, Vcoding_system_hash_table, Qnil)))
428
+ − 1722 {
440
+ − 1723 (*(int *) dangling_aliases)++;
+ − 1724 return 1;
428
+ − 1725 }
440
+ − 1726 else
+ − 1727 return 0;
+ − 1728 }
+ − 1729
+ − 1730 DEFUN ("define-coding-system-alias", Fdefine_coding_system_alias, 2, 2, 0, /*
+ − 1731 Define symbol ALIAS as an alias for coding system ALIASEE.
+ − 1732
+ − 1733 You can use this function to redefine an alias that has already been defined,
+ − 1734 but you cannot redefine a name which is the canonical name for a coding system.
+ − 1735 \(a canonical name of a coding system is what is returned when you call
+ − 1736 `coding-system-name' on a coding system).
+ − 1737
+ − 1738 ALIASEE itself can be an alias, which allows you to define nested aliases.
+ − 1739
+ − 1740 You are forbidden, however, from creating alias loops or `dangling' aliases.
+ − 1741 These will be detected, and an error will be signaled if you attempt to do so.
+ − 1742
+ − 1743 If ALIASEE is nil, then ALIAS will simply be undefined.
+ − 1744
+ − 1745 See also `coding-system-alias-p', `coding-system-aliasee',
+ − 1746 and `coding-system-canonical-name-p'.
+ − 1747 */
+ − 1748 (alias, aliasee))
+ − 1749 {
+ − 1750 Lisp_Object real_coding_system, probe;
+ − 1751
+ − 1752 CHECK_SYMBOL (alias);
+ − 1753
+ − 1754 if (!NILP (Fcoding_system_canonical_name_p (alias)))
563
+ − 1755 invalid_change
440
+ − 1756 ("Symbol is the canonical name of a coding system and cannot be redefined",
+ − 1757 alias);
+ − 1758
+ − 1759 if (NILP (aliasee))
+ − 1760 {
771
+ − 1761 Lisp_Object subsidiary_unix = add_suffix_to_symbol (alias, "-unix");
+ − 1762 Lisp_Object subsidiary_dos = add_suffix_to_symbol (alias, "-dos");
+ − 1763 Lisp_Object subsidiary_mac = add_suffix_to_symbol (alias, "-mac");
440
+ − 1764
+ − 1765 Fremhash (alias, Vcoding_system_hash_table);
+ − 1766
+ − 1767 /* Undefine subsidiary aliases,
+ − 1768 presumably created by a previous call to this function */
+ − 1769 if (! NILP (Fcoding_system_alias_p (subsidiary_unix)) &&
+ − 1770 ! NILP (Fcoding_system_alias_p (subsidiary_dos)) &&
+ − 1771 ! NILP (Fcoding_system_alias_p (subsidiary_mac)))
+ − 1772 {
+ − 1773 Fdefine_coding_system_alias (subsidiary_unix, Qnil);
+ − 1774 Fdefine_coding_system_alias (subsidiary_dos, Qnil);
+ − 1775 Fdefine_coding_system_alias (subsidiary_mac, Qnil);
+ − 1776 }
+ − 1777
+ − 1778 /* Undefine dangling coding system aliases. */
+ − 1779 {
+ − 1780 int dangling_aliases;
+ − 1781
+ − 1782 do {
+ − 1783 dangling_aliases = 0;
+ − 1784 elisp_map_remhash (dangling_coding_system_alias_p,
+ − 1785 Vcoding_system_hash_table,
+ − 1786 &dangling_aliases);
+ − 1787 } while (dangling_aliases > 0);
+ − 1788 }
+ − 1789
+ − 1790 return Qnil;
+ − 1791 }
+ − 1792
+ − 1793 if (CODING_SYSTEMP (aliasee))
+ − 1794 aliasee = XCODING_SYSTEM_NAME (aliasee);
+ − 1795
+ − 1796 /* Checks that aliasee names a coding-system */
+ − 1797 real_coding_system = Fget_coding_system (aliasee);
+ − 1798
+ − 1799 /* Check for coding system alias loops */
+ − 1800 if (EQ (alias, aliasee))
563
+ − 1801 alias_loop: invalid_operation_2
440
+ − 1802 ("Attempt to create a coding system alias loop", alias, aliasee);
+ − 1803
+ − 1804 for (probe = aliasee;
+ − 1805 SYMBOLP (probe);
+ − 1806 probe = Fgethash (probe, Vcoding_system_hash_table, Qzero))
+ − 1807 {
+ − 1808 if (EQ (probe, alias))
+ − 1809 goto alias_loop;
+ − 1810 }
+ − 1811
+ − 1812 Fputhash (alias, aliasee, Vcoding_system_hash_table);
+ − 1813
+ − 1814 /* Set up aliases for subsidiaries.
+ − 1815 #### There must be a better way to handle subsidiary coding systems. */
+ − 1816 {
+ − 1817 static const char *suffixes[] = { "-unix", "-dos", "-mac" };
+ − 1818 int i;
+ − 1819 for (i = 0; i < countof (suffixes); i++)
+ − 1820 {
+ − 1821 Lisp_Object alias_subsidiary =
771
+ − 1822 add_suffix_to_symbol (alias, suffixes[i]);
440
+ − 1823 Lisp_Object aliasee_subsidiary =
771
+ − 1824 add_suffix_to_symbol (aliasee, suffixes[i]);
440
+ − 1825
+ − 1826 if (! NILP (Ffind_coding_system (aliasee_subsidiary)))
+ − 1827 Fdefine_coding_system_alias (alias_subsidiary, aliasee_subsidiary);
+ − 1828 }
+ − 1829 }
428
+ − 1830 /* FSF return value is a vector of [ALIAS-unix ALIAS-dos ALIAS-mac],
+ − 1831 but it doesn't look intentional, so I'd rather return something
+ − 1832 meaningful or nothing at all. */
+ − 1833 return Qnil;
+ − 1834 }
+ − 1835
+ − 1836 static Lisp_Object
771
+ − 1837 subsidiary_coding_system (Lisp_Object coding_system, enum eol_type type)
428
+ − 1838 {
+ − 1839 Lisp_Coding_System *cs = XCODING_SYSTEM (coding_system);
+ − 1840 Lisp_Object new_coding_system;
+ − 1841
+ − 1842 switch (type)
+ − 1843 {
+ − 1844 case EOL_AUTODETECT: return coding_system;
+ − 1845 case EOL_LF: new_coding_system = CODING_SYSTEM_EOL_LF (cs); break;
+ − 1846 case EOL_CR: new_coding_system = CODING_SYSTEM_EOL_CR (cs); break;
+ − 1847 case EOL_CRLF: new_coding_system = CODING_SYSTEM_EOL_CRLF (cs); break;
442
+ − 1848 default: abort (); return Qnil;
428
+ − 1849 }
+ − 1850
+ − 1851 return NILP (new_coding_system) ? coding_system : new_coding_system;
+ − 1852 }
+ − 1853
+ − 1854 DEFUN ("subsidiary-coding-system", Fsubsidiary_coding_system, 2, 2, 0, /*
+ − 1855 Return the subsidiary coding system of CODING-SYSTEM with eol type EOL-TYPE.
771
+ − 1856 The logically opposite operation is `coding-system-base'.
428
+ − 1857 */
+ − 1858 (coding_system, eol_type))
+ − 1859 {
771
+ − 1860 coding_system = get_coding_system_for_text_file (coding_system, 0);
428
+ − 1861
+ − 1862 return subsidiary_coding_system (coding_system,
+ − 1863 symbol_to_eol_type (eol_type));
+ − 1864 }
+ − 1865
771
+ − 1866 DEFUN ("coding-system-base", Fcoding_system_base,
+ − 1867 1, 1, 0, /*
+ − 1868 Return the base coding system of CODING-SYSTEM.
+ − 1869 If CODING-SYSTEM is a subsidiary, this returns its parent; otherwise, it
+ − 1870 returns CODING-SYSTEM.
+ − 1871 The logically opposite operation is `subsidiary-coding-system'.
+ − 1872 */
+ − 1873 (coding_system))
+ − 1874 {
+ − 1875 Lisp_Object base;
+ − 1876
+ − 1877 coding_system = Fget_coding_system (coding_system);
+ − 1878 if (EQ (XCODING_SYSTEM_NAME (coding_system), Qbinary))
+ − 1879 return Fget_coding_system (Qraw_text); /* hack! */
+ − 1880 base = XCODING_SYSTEM_SUBSIDIARY_PARENT (coding_system);
+ − 1881 if (!NILP (base))
+ − 1882 return base;
+ − 1883 return coding_system;
+ − 1884 }
+ − 1885
+ − 1886 DEFUN ("coding-system-used-for-io", Fcoding_system_used_for_io,
+ − 1887 1, 1, 0, /*
+ − 1888 Return the coding system actually used for I/O.
+ − 1889 In some cases (e.g. when a particular EOL type is specified) this won't be
+ − 1890 the coding system itself. This can be useful when trying to track down
+ − 1891 more closely how exactly data is decoded.
+ − 1892 */
+ − 1893 (coding_system))
+ − 1894 {
+ − 1895 Lisp_Object canon;
+ − 1896
+ − 1897 coding_system = Fget_coding_system (coding_system);
+ − 1898 canon = XCODING_SYSTEM_CANONICAL (coding_system);
+ − 1899 if (!NILP (canon))
+ − 1900 return canon;
+ − 1901 return coding_system;
+ − 1902 }
+ − 1903
428
+ − 1904
+ − 1905 /************************************************************************/
+ − 1906 /* Coding system accessors */
+ − 1907 /************************************************************************/
+ − 1908
771
+ − 1909 DEFUN ("coding-system-description", Fcoding_system_description, 1, 1, 0, /*
+ − 1910 Return the description for CODING-SYSTEM.
+ − 1911 The `description' of a coding system is a short English phrase giving the
+ − 1912 name rendered according to English punctuation rules, plus possibly some
+ − 1913 explanatory text (typically in the form of a parenthetical phrase). The
+ − 1914 description is intended to be short enough that it can appear as a menu item,
+ − 1915 and clear enough to be recognizable even to someone who is assumed to have
+ − 1916 some basic familiarity with different encodings but may not know all the
+ − 1917 technical names; thus, for `cn-gb-2312' is described as "Chinese EUC" and
+ − 1918 `hz-gb-2312' is described as "Hz/ZW (Chinese)", where the actual name of
+ − 1919 the encoding is given, followed by a note that this is a Chinese encoding,
+ − 1920 because the great majority of people encountering this would have no idea
+ − 1921 what it is, and giving the language indicates whether the encoding should
+ − 1922 just be ignored or (conceivably) investigated more thoroughly.
428
+ − 1923 */
+ − 1924 (coding_system))
+ − 1925 {
+ − 1926 coding_system = Fget_coding_system (coding_system);
771
+ − 1927 return XCODING_SYSTEM_DESCRIPTION (coding_system);
428
+ − 1928 }
+ − 1929
+ − 1930 DEFUN ("coding-system-type", Fcoding_system_type, 1, 1, 0, /*
+ − 1931 Return the type of CODING-SYSTEM.
+ − 1932 */
+ − 1933 (coding_system))
+ − 1934 {
771
+ − 1935 coding_system = Fget_coding_system (coding_system);
+ − 1936 return XCODING_SYSTEM_TYPE (coding_system);
428
+ − 1937 }
+ − 1938
+ − 1939 DEFUN ("coding-system-property", Fcoding_system_property, 2, 2, 0, /*
+ − 1940 Return the PROP property of CODING-SYSTEM.
+ − 1941 */
+ − 1942 (coding_system, prop))
+ − 1943 {
+ − 1944 coding_system = Fget_coding_system (coding_system);
+ − 1945 CHECK_SYMBOL (prop);
+ − 1946
+ − 1947 if (EQ (prop, Qname))
+ − 1948 return XCODING_SYSTEM_NAME (coding_system);
+ − 1949 else if (EQ (prop, Qtype))
+ − 1950 return Fcoding_system_type (coding_system);
771
+ − 1951 else if (EQ (prop, Qdescription))
+ − 1952 return XCODING_SYSTEM_DESCRIPTION (coding_system);
428
+ − 1953 else if (EQ (prop, Qmnemonic))
+ − 1954 return XCODING_SYSTEM_MNEMONIC (coding_system);
771
+ − 1955 else if (EQ (prop, Qdocumentation))
+ − 1956 return XCODING_SYSTEM_DOCUMENTATION (coding_system);
428
+ − 1957 else if (EQ (prop, Qeol_type))
771
+ − 1958 return eol_type_to_symbol (XCODING_SYSTEM_EOL_TYPE
+ − 1959 (coding_system));
428
+ − 1960 else if (EQ (prop, Qeol_lf))
+ − 1961 return XCODING_SYSTEM_EOL_LF (coding_system);
+ − 1962 else if (EQ (prop, Qeol_crlf))
+ − 1963 return XCODING_SYSTEM_EOL_CRLF (coding_system);
+ − 1964 else if (EQ (prop, Qeol_cr))
+ − 1965 return XCODING_SYSTEM_EOL_CR (coding_system);
+ − 1966 else if (EQ (prop, Qpost_read_conversion))
+ − 1967 return XCODING_SYSTEM_POST_READ_CONVERSION (coding_system);
+ − 1968 else if (EQ (prop, Qpre_write_conversion))
+ − 1969 return XCODING_SYSTEM_PRE_WRITE_CONVERSION (coding_system);
771
+ − 1970 else
+ − 1971 {
+ − 1972 Lisp_Object value = CODESYSMETH_OR_GIVEN (XCODING_SYSTEM (coding_system),
+ − 1973 getprop,
+ − 1974 (coding_system, prop),
+ − 1975 Qunbound);
+ − 1976 if (UNBOUNDP (value))
+ − 1977 invalid_constant ("Unrecognized property", prop);
+ − 1978 return value;
+ − 1979 }
+ − 1980 }
+ − 1981
+ − 1982
+ − 1983 /************************************************************************/
+ − 1984 /* Coding stream functions */
+ − 1985 /************************************************************************/
+ − 1986
+ − 1987 /* A coding stream is a stream used for encoding or decoding text. The
+ − 1988 coding-stream object keeps track of the actual coding system, the stream
+ − 1989 that is at the other end, and data that needs to be persistent across
+ − 1990 the lifetime of the stream. */
+ − 1991
+ − 1992 DEFINE_LSTREAM_IMPLEMENTATION ("coding", coding);
+ − 1993
+ − 1994 /* Encoding and decoding are parallel operations, so we create just one
+ − 1995 stream for both. "Decoding" may involve the extra step of autodetection
+ − 1996 of the data format, but that's only because of the conventional
+ − 1997 definition of decoding as converting from external- to
+ − 1998 internal-formatted data.
+ − 1999
+ − 2000 #### We really need to abstract out the concept of "data formats" and
+ − 2001 define "converters" that convert from and to specified formats,
+ − 2002 eliminating the idea of decoding and encoding. When specifying a
+ − 2003 conversion process, we need to give the data formats themselves, not the
+ − 2004 conversion processes -- e.g. a coding system called "Unicode->multibyte"
+ − 2005 converts in both directions, and we could auto-detect the format of data
+ − 2006 at either end. */
+ − 2007
+ − 2008 static Bytecount
+ − 2009 coding_reader (Lstream *stream, unsigned char *data, Bytecount size)
+ − 2010 {
+ − 2011 unsigned char *orig_data = data;
+ − 2012 Bytecount read_size;
+ − 2013 int error_occurred = 0;
+ − 2014 struct coding_stream *str = CODING_STREAM_DATA (stream);
+ − 2015
+ − 2016 /* We need to interface to coding_{de,en}code_1(), which expects to take
+ − 2017 some amount of data and store the result into a Dynarr. We have
+ − 2018 coding_{de,en}code_1() store into c->runoff, and take data from there
+ − 2019 as necessary. */
+ − 2020
+ − 2021 /* We loop until we have enough data, reading chunks from the other
+ − 2022 end and converting it. */
+ − 2023 while (1)
+ − 2024 {
+ − 2025 /* Take data from convert_to if we can. Make sure to take at
+ − 2026 most SIZE bytes, and delete the data from convert_to. */
+ − 2027 if (Dynarr_length (str->convert_to) > 0)
+ − 2028 {
+ − 2029 Bytecount chunk =
+ − 2030 min (size, (Bytecount) Dynarr_length (str->convert_to));
+ − 2031 memcpy (data, Dynarr_atp (str->convert_to, 0), chunk);
+ − 2032 Dynarr_delete_many (str->convert_to, 0, chunk);
+ − 2033 data += chunk;
+ − 2034 size -= chunk;
+ − 2035 }
+ − 2036
+ − 2037 if (size == 0)
+ − 2038 break; /* No more room for data */
+ − 2039
+ − 2040 if (str->eof)
+ − 2041 break;
+ − 2042
+ − 2043 {
+ − 2044 /* Exhausted convert_to, so get some more. Read into convert_from,
+ − 2045 after existing "rejected" data from the last conversion. */
+ − 2046 Bytecount rejected = Dynarr_length (str->convert_from);
+ − 2047 /* #### 1024 is arbitrary; we really need to separate 0 from EOF,
+ − 2048 and when we get 0, keep taking more data until we don't get 0 --
+ − 2049 we don't know how much data the conversion routine might need
+ − 2050 before it can generate any data of its own */
814
+ − 2051 Bytecount readmore =
+ − 2052 str->one_byte_at_a_time ? (Bytecount) 1 :
+ − 2053 max (size, (Bytecount) 1024);
771
+ − 2054
+ − 2055 Dynarr_add_many (str->convert_from, 0, readmore);
+ − 2056 read_size = Lstream_read (str->other_end,
+ − 2057 Dynarr_atp (str->convert_from, rejected),
+ − 2058 readmore);
+ − 2059 /* Trim size down to how much we actually got */
+ − 2060 Dynarr_set_size (str->convert_from, rejected + max (0, read_size));
+ − 2061 }
+ − 2062
+ − 2063 if (read_size < 0) /* LSTREAM_ERROR */
+ − 2064 {
+ − 2065 error_occurred = 1;
+ − 2066 break;
+ − 2067 }
+ − 2068 if (read_size == 0) /* LSTREAM_EOF */
+ − 2069 /* There might be some more end data produced in the translation,
+ − 2070 so we set a flag and call the conversion method once more to
+ − 2071 output any final stuff it may be holding, any "go back to a sane
+ − 2072 state" escape sequences, etc. The conversion method is free to
+ − 2073 look at this flag, and we use it above to stop looping. */
+ − 2074 str->eof = 1;
+ − 2075 {
+ − 2076 Bytecount processed;
+ − 2077 Bytecount to_process = Dynarr_length (str->convert_from);
+ − 2078
+ − 2079 /* Convert the data, and save any rejected data in convert_from */
+ − 2080 processed =
+ − 2081 XCODESYSMETH (str->codesys, convert,
+ − 2082 (str, Dynarr_atp (str->convert_from, 0),
+ − 2083 str->convert_to, to_process));
+ − 2084 if (processed < 0)
+ − 2085 {
+ − 2086 error_occurred = 1;
+ − 2087 break;
+ − 2088 }
+ − 2089 assert (processed <= to_process);
+ − 2090 if (processed < to_process)
+ − 2091 memmove (Dynarr_atp (str->convert_from, 0),
+ − 2092 Dynarr_atp (str->convert_from, processed),
+ − 2093 to_process - processed);
+ − 2094 Dynarr_set_size (str->convert_from, to_process - processed);
+ − 2095 }
+ − 2096 }
+ − 2097
+ − 2098 if (data - orig_data == 0)
+ − 2099 return error_occurred ? -1 : 0;
+ − 2100 else
+ − 2101 return data - orig_data;
+ − 2102 }
+ − 2103
+ − 2104 static Bytecount
+ − 2105 coding_writer (Lstream *stream, const unsigned char *data, Bytecount size)
+ − 2106 {
+ − 2107 struct coding_stream *str = CODING_STREAM_DATA (stream);
+ − 2108
+ − 2109 /* Convert all our data into convert_to, and then attempt to write
+ − 2110 it all out to the other end. */
+ − 2111 Dynarr_reset (str->convert_to);
+ − 2112 size = XCODESYSMETH (str->codesys, convert,
+ − 2113 (str, data, str->convert_to, size));
+ − 2114 if (Lstream_write (str->other_end, Dynarr_atp (str->convert_to, 0),
+ − 2115 Dynarr_length (str->convert_to)) < 0)
+ − 2116 return -1;
+ − 2117 else
+ − 2118 /* The return value indicates how much of the incoming data was
+ − 2119 processed, not how many bytes were written. */
+ − 2120 return size;
+ − 2121 }
+ − 2122
+ − 2123 static int
+ − 2124 encode_decode_source_sink_type_is_char (Lisp_Object cs,
+ − 2125 enum source_or_sink sex,
+ − 2126 enum encode_decode direction)
+ − 2127 {
+ − 2128 return (direction == CODING_DECODE ?
+ − 2129 decoding_source_sink_type_is_char (cs, sex) :
+ − 2130 encoding_source_sink_type_is_char (cs, sex));
+ − 2131 }
+ − 2132
+ − 2133 /* Ensure that the convert methods only get full characters sent to them to
+ − 2134 convert if the source of that conversion is characters; and that no such
+ − 2135 full-character checking happens when the source is bytes. Keep in mind
+ − 2136 that (1) the conversion_end_type return values take the perspective of
+ − 2137 encoding; (2) the source for decoding is the same as the sink for
+ − 2138 encoding; (3) when writing, the data is given to us, and we set our own
+ − 2139 stream to be character mode or not; (4) when reading, the data comes
+ − 2140 from the other_end stream, and we set that one to be character mode or
+ − 2141 not. This is consistent with the comment above the prototype for
+ − 2142 Lstream_set_character_mode(), which lays out rules for who is allowed to
+ − 2143 modify the character type mode on a stream.
+ − 2144
814
+ − 2145 If we're a read stream, we're always setting character mode on the
+ − 2146 source, but we also set it on ourselves consistent with the flag that
+ − 2147 can disable this (see again the comment above
+ − 2148 Lstream_set_character_mode()).
+ − 2149 */
771
+ − 2150
+ − 2151 static void
+ − 2152 set_coding_character_mode (Lstream *stream)
+ − 2153 {
+ − 2154 struct coding_stream *str = CODING_STREAM_DATA (stream);
+ − 2155 Lstream *stream_to_set =
+ − 2156 stream->flags & LSTREAM_FL_WRITE ? stream : str->other_end;
+ − 2157 if (encode_decode_source_sink_type_is_char
+ − 2158 (str->codesys, CODING_SOURCE, str->direction))
+ − 2159 Lstream_set_character_mode (stream_to_set);
+ − 2160 else
+ − 2161 Lstream_unset_character_mode (stream_to_set);
814
+ − 2162 if (str->set_char_mode_on_us_when_reading &&
+ − 2163 (stream->flags & LSTREAM_FL_READ))
+ − 2164 {
+ − 2165 if (encode_decode_source_sink_type_is_char
+ − 2166 (str->codesys, CODING_SINK, str->direction))
+ − 2167 Lstream_set_character_mode (stream);
+ − 2168 else
+ − 2169 Lstream_unset_character_mode (stream);
+ − 2170 }
771
+ − 2171 }
+ − 2172
+ − 2173 static Lisp_Object
+ − 2174 coding_marker (Lisp_Object stream)
+ − 2175 {
+ − 2176 struct coding_stream *str = CODING_STREAM_DATA (XLSTREAM (stream));
+ − 2177
+ − 2178 mark_object (str->orig_codesys);
+ − 2179 mark_object (str->codesys);
+ − 2180 MAYBE_XCODESYSMETH (str->codesys, mark_coding_stream, (str));
+ − 2181 return wrap_lstream (str->other_end);
+ − 2182 }
+ − 2183
+ − 2184 static int
+ − 2185 coding_rewinder (Lstream *stream)
+ − 2186 {
+ − 2187 struct coding_stream *str = CODING_STREAM_DATA (stream);
+ − 2188 MAYBE_XCODESYSMETH (str->codesys, rewind_coding_stream, (str));
+ − 2189
+ − 2190 str->ch = 0;
+ − 2191 Dynarr_reset (str->convert_to);
+ − 2192 Dynarr_reset (str->convert_from);
+ − 2193 return Lstream_rewind (str->other_end);
+ − 2194 }
+ − 2195
+ − 2196 static int
+ − 2197 coding_seekable_p (Lstream *stream)
+ − 2198 {
+ − 2199 struct coding_stream *str = CODING_STREAM_DATA (stream);
+ − 2200 return Lstream_seekable_p (str->other_end);
+ − 2201 }
+ − 2202
+ − 2203 static int
+ − 2204 coding_flusher (Lstream *stream)
+ − 2205 {
+ − 2206 struct coding_stream *str = CODING_STREAM_DATA (stream);
+ − 2207 return Lstream_flush (str->other_end);
+ − 2208 }
+ − 2209
+ − 2210 static int
+ − 2211 coding_closer (Lstream *stream)
+ − 2212 {
+ − 2213 struct coding_stream *str = CODING_STREAM_DATA (stream);
+ − 2214 if (stream->flags & LSTREAM_FL_WRITE)
+ − 2215 {
+ − 2216 str->eof = 1;
+ − 2217 coding_writer (stream, 0, 0);
+ − 2218 str->eof = 0;
+ − 2219 }
+ − 2220 /* It's safe to free the runoff dynarrs now because they are used only
+ − 2221 during conversion. We need to keep the type-specific data around,
+ − 2222 though, because of canonicalize_after_coding. */
+ − 2223 if (str->convert_to)
+ − 2224 {
+ − 2225 Dynarr_free (str->convert_to);
+ − 2226 str->convert_to = 0;
+ − 2227 }
+ − 2228 if (str->convert_from)
428
+ − 2229 {
771
+ − 2230 Dynarr_free (str->convert_from);
+ − 2231 str->convert_from = 0;
+ − 2232 }
+ − 2233
800
+ − 2234 if (str->no_close_other)
+ − 2235 return Lstream_flush (str->other_end);
+ − 2236 else
+ − 2237 return Lstream_close (str->other_end);
771
+ − 2238 }
+ − 2239
+ − 2240 static void
+ − 2241 coding_finalizer (Lstream *stream)
+ − 2242 {
+ − 2243 struct coding_stream *str = CODING_STREAM_DATA (stream);
+ − 2244
+ − 2245 assert (!str->finalized);
+ − 2246 MAYBE_XCODESYSMETH (str->codesys, finalize_coding_stream, (str));
+ − 2247 if (str->data)
+ − 2248 {
+ − 2249 xfree (str->data);
+ − 2250 str->data = 0;
+ − 2251 }
+ − 2252 str->finalized = 1;
+ − 2253 }
+ − 2254
+ − 2255 static Lisp_Object
+ − 2256 coding_stream_canonicalize_after_coding (Lstream *stream)
+ − 2257 {
+ − 2258 struct coding_stream *str = CODING_STREAM_DATA (stream);
+ − 2259
+ − 2260 return XCODESYSMETH_OR_GIVEN (str->codesys, canonicalize_after_coding,
+ − 2261 (str), str->codesys);
+ − 2262 }
+ − 2263
+ − 2264 Lisp_Object
+ − 2265 coding_stream_detected_coding_system (Lstream *stream)
+ − 2266 {
+ − 2267 Lisp_Object codesys =
+ − 2268 coding_stream_canonicalize_after_coding (stream);
+ − 2269 if (NILP (codesys))
+ − 2270 return Fget_coding_system (Qidentity);
+ − 2271 return codesys;
+ − 2272 }
+ − 2273
+ − 2274 Lisp_Object
+ − 2275 coding_stream_coding_system (Lstream *stream)
+ − 2276 {
+ − 2277 return CODING_STREAM_DATA (stream)->codesys;
+ − 2278 }
+ − 2279
+ − 2280 /* Change the coding system associated with a stream. */
+ − 2281
+ − 2282 void
+ − 2283 set_coding_stream_coding_system (Lstream *lstr, Lisp_Object codesys)
+ − 2284 {
+ − 2285 struct coding_stream *str = CODING_STREAM_DATA (lstr);
+ − 2286 if (EQ (str->orig_codesys, codesys))
+ − 2287 return;
+ − 2288 /* We do the equivalent of closing the stream, destroying it, and
+ − 2289 reinitializing it. This includes flushing out the data and signalling
+ − 2290 EOF, if we're a writing stream; we also replace the type-specific data
+ − 2291 with the data appropriate for the new coding system. */
+ − 2292 if (!NILP (str->codesys))
+ − 2293 {
+ − 2294 if (lstr->flags & LSTREAM_FL_WRITE)
+ − 2295 {
+ − 2296 Lstream_flush (lstr);
+ − 2297 str->eof = 1;
+ − 2298 coding_writer (lstr, 0, 0);
+ − 2299 str->eof = 0;
+ − 2300 }
+ − 2301 MAYBE_XCODESYSMETH (str->codesys, finalize_coding_stream, (str));
+ − 2302 }
+ − 2303 str->orig_codesys = codesys;
+ − 2304 str->codesys = coding_system_real_canonical (codesys);
+ − 2305
+ − 2306 if (str->data)
+ − 2307 {
+ − 2308 xfree (str->data);
+ − 2309 str->data = 0;
428
+ − 2310 }
771
+ − 2311 if (XCODING_SYSTEM_METHODS (str->codesys)->coding_data_size)
+ − 2312 str->data =
+ − 2313 xmalloc_and_zero (XCODING_SYSTEM_METHODS (str->codesys)->
+ − 2314 coding_data_size);
+ − 2315 MAYBE_XCODESYSMETH (str->codesys, init_coding_stream, (str));
+ − 2316 /* The new coding system may have different ideas regarding whether its
+ − 2317 ends are characters or bytes. */
+ − 2318 set_coding_character_mode (lstr);
+ − 2319 }
+ − 2320
+ − 2321 /* WARNING WARNING WARNING WARNING!!!!! If you open up a coding
+ − 2322 stream for writing, no automatic code detection will be performed.
+ − 2323 The reason for this is that automatic code detection requires a
+ − 2324 seekable input. Things will also fail if you open a coding
+ − 2325 stream for reading using a non-fully-specified coding system and
+ − 2326 a non-seekable input stream. */
+ − 2327
+ − 2328 static Lisp_Object
+ − 2329 make_coding_stream_1 (Lstream *stream, Lisp_Object codesys,
800
+ − 2330 const char *mode, enum encode_decode direction,
802
+ − 2331 int flags)
771
+ − 2332 {
+ − 2333 Lstream *lstr = Lstream_new (lstream_coding, mode);
+ − 2334 struct coding_stream *str = CODING_STREAM_DATA (lstr);
+ − 2335
+ − 2336 codesys = Fget_coding_system (codesys);
+ − 2337 xzero (*str);
+ − 2338 str->codesys = Qnil;
+ − 2339 str->orig_codesys = Qnil;
+ − 2340 str->us = lstr;
+ − 2341 str->other_end = stream;
+ − 2342 str->convert_to = Dynarr_new (unsigned_char);
+ − 2343 str->convert_from = Dynarr_new (unsigned_char);
+ − 2344 str->direction = direction;
814
+ − 2345 if (flags & LSTREAM_FL_NO_CLOSE_OTHER)
802
+ − 2346 str->no_close_other = 1;
814
+ − 2347 if (flags & LSTREAM_FL_READ_ONE_BYTE_AT_A_TIME)
802
+ − 2348 str->one_byte_at_a_time = 1;
814
+ − 2349 if (!(flags & LSTREAM_FL_NO_INIT_CHAR_MODE_WHEN_READING))
+ − 2350 str->set_char_mode_on_us_when_reading = 1;
802
+ − 2351
771
+ − 2352 set_coding_stream_coding_system (lstr, codesys);
793
+ − 2353 return wrap_lstream (lstr);
771
+ − 2354 }
+ − 2355
814
+ − 2356 /* FLAGS:
+ − 2357
+ − 2358 LSTREAM_FL_NO_CLOSE_OTHER
+ − 2359 Don't close STREAM (the stream at the other end) when this stream is
+ − 2360 closed.
+ − 2361
+ − 2362 LSTREAM_FL_READ_ONE_BYTE_AT_A_TIME
+ − 2363 When reading from STREAM, read and process one byte at a time rather
+ − 2364 than in large chunks. This is for reading from TTY's, so we don't
+ − 2365 block. #### We should instead create a non-blocking filedesc stream
+ − 2366 that emulates the behavior as necessary using select(), when the
+ − 2367 fcntls don't work. (As seems to be the case on Cygwin.)
+ − 2368
+ − 2369 LSTREAM_FL_NO_INIT_CHAR_MODE_WHEN_READING
+ − 2370 When reading from STREAM, read and process one byte at a time rather
+ − 2371 than in large chunks. This is for reading from TTY's, so we don't
+ − 2372 block. #### We should instead create a non-blocking filedesc stream
+ − 2373 that emulates the behavior as necessary using select(), when the
+ − 2374 fcntls don't work. (As seems to be the case on Cygwin.)
+ − 2375 */
771
+ − 2376 Lisp_Object
+ − 2377 make_coding_input_stream (Lstream *stream, Lisp_Object codesys,
802
+ − 2378 enum encode_decode direction, int flags)
771
+ − 2379 {
800
+ − 2380 return make_coding_stream_1 (stream, codesys, "r", direction,
802
+ − 2381 flags);
771
+ − 2382 }
+ − 2383
814
+ − 2384 /* FLAGS:
+ − 2385
+ − 2386 LSTREAM_FL_NO_CLOSE_OTHER
+ − 2387 Don't close STREAM (the stream at the other end) when this stream is
+ − 2388 closed.
+ − 2389 */
771
+ − 2390 Lisp_Object
+ − 2391 make_coding_output_stream (Lstream *stream, Lisp_Object codesys,
802
+ − 2392 enum encode_decode direction, int flags)
771
+ − 2393 {
800
+ − 2394 return make_coding_stream_1 (stream, codesys, "w", direction,
802
+ − 2395 flags);
771
+ − 2396 }
+ − 2397
+ − 2398 static Lisp_Object
+ − 2399 encode_decode_coding_region (Lisp_Object start, Lisp_Object end,
+ − 2400 Lisp_Object coding_system, Lisp_Object buffer,
+ − 2401 enum encode_decode direction)
+ − 2402 {
+ − 2403 Charbpos b, e;
+ − 2404 struct buffer *buf = decode_buffer (buffer, 0);
+ − 2405 Lisp_Object instream = Qnil, to_outstream = Qnil, outstream = Qnil;
+ − 2406 Lisp_Object from_outstream = Qnil, auto_outstream = Qnil;
+ − 2407 Lisp_Object lb_outstream = Qnil;
+ − 2408 Lisp_Object next;
+ − 2409 Lstream *istr, *ostr;
+ − 2410 struct gcpro gcpro1, gcpro2, gcpro3, gcpro4, gcpro5;
+ − 2411 struct gcpro ngcpro1;
+ − 2412 int source_char, sink_char;
+ − 2413
+ − 2414 get_buffer_range_char (buf, start, end, &b, &e, 0);
+ − 2415 barf_if_buffer_read_only (buf, b, e);
+ − 2416
+ − 2417 GCPRO5 (instream, to_outstream, outstream, from_outstream, lb_outstream);
+ − 2418 NGCPRO1 (auto_outstream);
+ − 2419
+ − 2420 coding_system = Fget_coding_system (coding_system);
+ − 2421 source_char = encode_decode_source_sink_type_is_char (coding_system,
+ − 2422 CODING_SOURCE,
+ − 2423 direction);
+ − 2424 sink_char = encode_decode_source_sink_type_is_char (coding_system,
+ − 2425 CODING_SINK,
+ − 2426 direction);
+ − 2427
+ − 2428 /* Order is IN <---> [TO] -> OUT -> [FROM] -> [AUTODETECT-EOL] -> LB */
+ − 2429 instream = make_lisp_buffer_input_stream (buf, b, e, 0);
+ − 2430 next = lb_outstream = make_lisp_buffer_output_stream (buf, b, 0);
+ − 2431
+ − 2432 if (direction == CODING_DECODE &&
+ − 2433 XCODING_SYSTEM_EOL_TYPE (coding_system) == EOL_AUTODETECT)
+ − 2434 next = auto_outstream =
+ − 2435 make_coding_output_stream
800
+ − 2436 (XLSTREAM (next), Fget_coding_system (Qconvert_eol_autodetect),
+ − 2437 CODING_DECODE, 0);
771
+ − 2438
+ − 2439 if (!sink_char)
+ − 2440 next = from_outstream =
800
+ − 2441 make_coding_output_stream (XLSTREAM (next), Qbinary, CODING_DECODE, 0);
771
+ − 2442 outstream = make_coding_output_stream (XLSTREAM (next), coding_system,
800
+ − 2443 direction, 0);
771
+ − 2444 if (!source_char)
428
+ − 2445 {
771
+ − 2446 to_outstream =
+ − 2447 make_coding_output_stream (XLSTREAM (outstream),
800
+ − 2448 Qbinary, CODING_ENCODE, 0);
771
+ − 2449 ostr = XLSTREAM (to_outstream);
+ − 2450 }
+ − 2451 else
+ − 2452 ostr = XLSTREAM (outstream);
+ − 2453 istr = XLSTREAM (instream);
+ − 2454
+ − 2455 /* The chain of streams looks like this:
+ − 2456
+ − 2457 [BUFFER] <----- send through
+ − 2458 ------> [CHAR->BYTE i.e. ENCODE AS BINARY if source is
+ − 2459 in bytes]
+ − 2460 ------> [ENCODE/DECODE AS SPECIFIED]
+ − 2461 ------> [BYTE->CHAR i.e. DECODE AS BINARY
+ − 2462 if sink is in bytes]
+ − 2463 ------> [AUTODETECT EOL if
+ − 2464 we're decoding and
+ − 2465 coding system calls
+ − 2466 for this]
+ − 2467 ------> [BUFFER]
+ − 2468 */
+ − 2469 while (1)
+ − 2470 {
+ − 2471 char tempbuf[1024]; /* some random amount */
+ − 2472 Charbpos newpos, even_newer_pos;
+ − 2473 Charbpos oldpos = lisp_buffer_stream_startpos (istr);
+ − 2474 Bytecount size_in_bytes =
+ − 2475 Lstream_read (istr, tempbuf, sizeof (tempbuf));
+ − 2476
+ − 2477 if (!size_in_bytes)
+ − 2478 break;
+ − 2479 newpos = lisp_buffer_stream_startpos (istr);
+ − 2480 Lstream_write (ostr, tempbuf, size_in_bytes);
+ − 2481 even_newer_pos = lisp_buffer_stream_startpos (istr);
+ − 2482 buffer_delete_range (buf, even_newer_pos - (newpos - oldpos),
+ − 2483 even_newer_pos, 0);
428
+ − 2484 }
771
+ − 2485
+ − 2486 {
+ − 2487 Charcount retlen =
+ − 2488 lisp_buffer_stream_startpos (XLSTREAM (instream)) - b;
+ − 2489 Lstream_close (istr);
+ − 2490 Lstream_close (ostr);
+ − 2491 NUNGCPRO;
+ − 2492 UNGCPRO;
+ − 2493 Lstream_delete (istr);
+ − 2494 if (!NILP (from_outstream))
+ − 2495 Lstream_delete (XLSTREAM (from_outstream));
+ − 2496 Lstream_delete (XLSTREAM (outstream));
+ − 2497 if (!NILP (to_outstream))
+ − 2498 Lstream_delete (XLSTREAM (to_outstream));
+ − 2499 if (!NILP (auto_outstream))
+ − 2500 Lstream_delete (XLSTREAM (auto_outstream));
+ − 2501 Lstream_delete (XLSTREAM (lb_outstream));
+ − 2502 return make_int (retlen);
+ − 2503 }
+ − 2504 }
+ − 2505
+ − 2506 DEFUN ("decode-coding-region", Fdecode_coding_region, 3, 4, 0, /*
+ − 2507 Decode the text between START and END which is encoded in CODING-SYSTEM.
+ − 2508 This is useful if you've read in encoded text from a file without decoding
+ − 2509 it (e.g. you read in a JIS-formatted file but used the `binary' or
+ − 2510 `no-conversion' coding system, so that it shows up as "^[$B!<!+^[(B").
+ − 2511 Return length of decoded text.
+ − 2512 BUFFER defaults to the current buffer if unspecified.
+ − 2513 */
+ − 2514 (start, end, coding_system, buffer))
+ − 2515 {
+ − 2516 return encode_decode_coding_region (start, end, coding_system, buffer,
+ − 2517 CODING_DECODE);
+ − 2518 }
+ − 2519
+ − 2520 DEFUN ("encode-coding-region", Fencode_coding_region, 3, 4, 0, /*
+ − 2521 Encode the text between START and END using CODING-SYSTEM.
+ − 2522 This will, for example, convert Japanese characters into stuff such as
+ − 2523 "^[$B!<!+^[(B" if you use the JIS encoding. Return length of encoded
+ − 2524 text. BUFFER defaults to the current buffer if unspecified.
+ − 2525 */
+ − 2526 (start, end, coding_system, buffer))
+ − 2527 {
+ − 2528 return encode_decode_coding_region (start, end, coding_system, buffer,
+ − 2529 CODING_ENCODE);
428
+ − 2530 }
+ − 2531
+ − 2532
+ − 2533 /************************************************************************/
771
+ − 2534 /* Chain methods */
428
+ − 2535 /************************************************************************/
+ − 2536
771
+ − 2537 /* #### Need a way to create "opposite-direction" coding systems. */
+ − 2538
+ − 2539 /* Chain two or more coding systems together to make a combination coding
+ − 2540 system. */
+ − 2541 DEFINE_CODING_SYSTEM_TYPE (chain);
+ − 2542
+ − 2543 struct chain_coding_system
+ − 2544 {
+ − 2545 /* List of coding systems, in decode order */
+ − 2546 Lisp_Object *chain;
+ − 2547 /* Number of coding systems in list */
+ − 2548 int count;
+ − 2549 /* Coding system to return as a result of canonicalize-after-coding */
+ − 2550 Lisp_Object canonicalize_after_coding;
+ − 2551 };
+ − 2552
+ − 2553 struct chain_coding_stream
+ − 2554 {
+ − 2555 int initted;
+ − 2556 /* Lstreams for chain coding system */
+ − 2557 Lisp_Object *lstreams;
+ − 2558 int lstream_count;
+ − 2559 };
+ − 2560
+ − 2561 static const struct lrecord_description lo_description_1[] = {
+ − 2562 { XD_LISP_OBJECT, 0 },
+ − 2563 { XD_END }
+ − 2564 };
+ − 2565
+ − 2566 static const struct struct_description lo_description = {
+ − 2567 sizeof (Lisp_Object),
+ − 2568 lo_description_1
+ − 2569 };
+ − 2570
+ − 2571 static const struct lrecord_description chain_coding_system_description[] = {
+ − 2572 { XD_INT,
+ − 2573 coding_system_data_offset + offsetof (struct chain_coding_system,
+ − 2574 count) },
+ − 2575 { XD_STRUCT_PTR,
+ − 2576 coding_system_data_offset + offsetof (struct chain_coding_system,
+ − 2577 chain),
+ − 2578 XD_INDIRECT (0, 0), &lo_description },
+ − 2579 { XD_LISP_OBJECT,
+ − 2580 coding_system_data_offset + offsetof (struct chain_coding_system,
+ − 2581 canonicalize_after_coding) },
+ − 2582 { XD_END }
+ − 2583 };
+ − 2584
+ − 2585 static Lisp_Object
+ − 2586 chain_canonicalize (Lisp_Object codesys)
+ − 2587 {
+ − 2588 /* We make use of the fact that this method is called at init time, after
+ − 2589 properties have been parsed. init_method is called too early. */
+ − 2590 /* #### It's not clear we need this whole chain-canonicalize mechanism
+ − 2591 any more. */
+ − 2592 Lisp_Object chain = Flist (XCODING_SYSTEM_CHAIN_COUNT (codesys),
+ − 2593 XCODING_SYSTEM_CHAIN_CHAIN (codesys));
+ − 2594 chain = Fcons (XCODING_SYSTEM_PRE_WRITE_CONVERSION (codesys),
+ − 2595 Fcons (XCODING_SYSTEM_POST_READ_CONVERSION (codesys),
+ − 2596 chain));
+ − 2597 Fputhash (chain, codesys, Vchain_canonicalize_hash_table);
+ − 2598 return codesys;
+ − 2599 }
+ − 2600
+ − 2601 static Lisp_Object
+ − 2602 chain_canonicalize_after_coding (struct coding_stream *str)
+ − 2603 {
+ − 2604 Lisp_Object cac =
+ − 2605 XCODING_SYSTEM_CHAIN_CANONICALIZE_AFTER_CODING (str->codesys);
+ − 2606 if (!NILP (cac))
+ − 2607 return cac;
+ − 2608 return str->codesys;
+ − 2609 #if 0
+ − 2610 struct chain_coding_stream *data = CODING_STREAM_TYPE_DATA (str, chain);
+ − 2611 Lisp_Object us = str->codesys, codesys;
+ − 2612 int i;
+ − 2613 Lisp_Object chain;
+ − 2614 Lisp_Object tail;
+ − 2615 int changed = 0;
+ − 2616
+ − 2617 /* #### It's not clear we need this whole chain-canonicalize mechanism
+ − 2618 any more. */
+ − 2619 if (str->direction == CODING_ENCODE || !data->initted)
+ − 2620 return us;
+ − 2621
+ − 2622 chain = Flist (XCODING_SYSTEM_CHAIN_COUNT (us),
+ − 2623 XCODING_SYSTEM_CHAIN_CHAIN (us));
+ − 2624
+ − 2625 tail = chain;
+ − 2626 for (i = 0; i < XCODING_SYSTEM_CHAIN_COUNT (us); i++)
+ − 2627 {
+ − 2628 codesys = (coding_stream_canonicalize_after_coding
+ − 2629 (XLSTREAM (data->lstreams[i])));
+ − 2630 if (!EQ (codesys, XCAR (tail)))
+ − 2631 changed = 1;
+ − 2632 XCAR (tail) = codesys;
+ − 2633 tail = XCDR (tail);
+ − 2634 }
+ − 2635
+ − 2636 if (!changed)
+ − 2637 return us;
+ − 2638
+ − 2639 chain = delq_no_quit (Qnil, chain);
+ − 2640
+ − 2641 if (NILP (XCODING_SYSTEM_PRE_WRITE_CONVERSION (us)) &&
+ − 2642 NILP (XCODING_SYSTEM_POST_READ_CONVERSION (us)))
+ − 2643 {
+ − 2644 if (NILP (chain))
+ − 2645 return Qnil;
+ − 2646 if (NILP (XCDR (chain)))
+ − 2647 return XCAR (chain);
+ − 2648 }
+ − 2649
+ − 2650 codesys = Fgethash (Fcons (XCODING_SYSTEM_PRE_WRITE_CONVERSION (us),
+ − 2651 Fcons (XCODING_SYSTEM_POST_READ_CONVERSION (us),
+ − 2652 chain)), Vchain_canonicalize_hash_table,
+ − 2653 Qnil);
+ − 2654 if (!NILP (codesys))
+ − 2655 return codesys;
+ − 2656 return make_internal_coding_system
+ − 2657 (us, "internal-chain-canonicalizer-wrapper",
+ − 2658 Qchain, Qunbound, list2 (Qchain, chain));
+ − 2659 #endif /* 0 */
+ − 2660 }
+ − 2661
+ − 2662 static void
+ − 2663 chain_init (Lisp_Object codesys)
+ − 2664 {
+ − 2665 XCODING_SYSTEM_CHAIN_CANONICALIZE_AFTER_CODING (codesys) = Qnil;
+ − 2666 }
+ − 2667
+ − 2668 static void
+ − 2669 chain_mark (Lisp_Object codesys)
+ − 2670 {
+ − 2671 int i;
+ − 2672
+ − 2673 for (i = 0; i < XCODING_SYSTEM_CHAIN_COUNT (codesys); i++)
+ − 2674 mark_object (XCODING_SYSTEM_CHAIN_CHAIN (codesys)[i]);
+ − 2675 mark_object (XCODING_SYSTEM_CHAIN_CANONICALIZE_AFTER_CODING (codesys));
+ − 2676 }
+ − 2677
+ − 2678 static void
+ − 2679 chain_mark_coding_stream_1 (struct chain_coding_stream *data)
+ − 2680 {
+ − 2681 int i;
+ − 2682
+ − 2683 for (i = 0; i < data->lstream_count; i++)
+ − 2684 mark_object (data->lstreams[i]);
+ − 2685 }
+ − 2686
+ − 2687 static void
+ − 2688 chain_mark_coding_stream (struct coding_stream *str)
+ − 2689 {
+ − 2690 chain_mark_coding_stream_1 (CODING_STREAM_TYPE_DATA (str, chain));
+ − 2691 }
+ − 2692
+ − 2693 static void
+ − 2694 chain_print (Lisp_Object cs, Lisp_Object printcharfun, int escapeflag)
+ − 2695 {
+ − 2696 int i;
+ − 2697
826
+ − 2698 write_c_string (printcharfun, "(");
771
+ − 2699 for (i = 0; i < XCODING_SYSTEM_CHAIN_COUNT (cs); i++)
+ − 2700 {
826
+ − 2701 write_c_string (printcharfun, i == 0 ? "" : "->");
771
+ − 2702 print_coding_system_in_print_method (XCODING_SYSTEM_CHAIN_CHAIN (cs)[i],
+ − 2703 printcharfun, escapeflag);
+ − 2704 }
+ − 2705 {
+ − 2706 Lisp_Object cac = XCODING_SYSTEM_CHAIN_CANONICALIZE_AFTER_CODING (cs);
+ − 2707 if (!NILP (cac))
+ − 2708 {
+ − 2709 if (i > 0)
826
+ − 2710 write_c_string (printcharfun, " ");
+ − 2711 write_c_string (printcharfun, "canonicalize-after-coding=");
771
+ − 2712 print_coding_system_in_print_method (cac, printcharfun, escapeflag);
+ − 2713 }
+ − 2714 }
+ − 2715
826
+ − 2716 write_c_string (printcharfun, ")");
771
+ − 2717 }
+ − 2718
+ − 2719 static void
+ − 2720 chain_rewind_coding_stream_1 (struct chain_coding_stream *data)
+ − 2721 {
+ − 2722 /* Each will rewind the next; there is always at least one stream (the
+ − 2723 dynarr stream at the end) if we're initted */
+ − 2724 if (data->initted)
+ − 2725 Lstream_rewind (XLSTREAM (data->lstreams[0]));
+ − 2726 }
+ − 2727
+ − 2728 static void
+ − 2729 chain_rewind_coding_stream (struct coding_stream *str)
+ − 2730 {
+ − 2731 chain_rewind_coding_stream_1 (CODING_STREAM_TYPE_DATA (str, chain));
+ − 2732 }
+ − 2733
+ − 2734 static void
+ − 2735 chain_init_coding_streams_1 (struct chain_coding_stream *data,
+ − 2736 unsigned_char_dynarr *dst,
+ − 2737 int ncodesys, Lisp_Object *codesys,
+ − 2738 enum encode_decode direction)
+ − 2739 {
+ − 2740 int i;
+ − 2741 Lisp_Object lstream_out;
+ − 2742
+ − 2743 data->lstream_count = ncodesys + 1;
+ − 2744 data->lstreams = xnew_array (Lisp_Object, data->lstream_count);
+ − 2745
+ − 2746 lstream_out = make_dynarr_output_stream (dst);
+ − 2747 Lstream_set_buffering (XLSTREAM (lstream_out), LSTREAM_UNBUFFERED, 0);
+ − 2748 data->lstreams[data->lstream_count - 1] = lstream_out;
+ − 2749
+ − 2750 for (i = ncodesys - 1; i >= 0; i--)
+ − 2751 {
+ − 2752 data->lstreams[i] =
+ − 2753 make_coding_output_stream
+ − 2754 (XLSTREAM (lstream_out),
+ − 2755 codesys[direction == CODING_ENCODE ? ncodesys - (i + 1) : i],
800
+ − 2756 direction, 0);
771
+ − 2757 lstream_out = data->lstreams[i];
+ − 2758 Lstream_set_buffering (XLSTREAM (lstream_out), LSTREAM_UNBUFFERED,
+ − 2759 0);
+ − 2760 }
+ − 2761 data->initted = 1;
+ − 2762 }
+ − 2763
+ − 2764 static Bytecount
+ − 2765 chain_convert (struct coding_stream *str, const UExtbyte *src,
+ − 2766 unsigned_char_dynarr *dst, Bytecount n)
+ − 2767 {
+ − 2768 struct chain_coding_stream *data = CODING_STREAM_TYPE_DATA (str, chain);
+ − 2769
+ − 2770 if (str->eof)
+ − 2771 {
+ − 2772 /* Each will close the next; there is always at least one stream (the
+ − 2773 dynarr stream at the end) if we're initted. We need to close now
+ − 2774 because more data may be generated. */
+ − 2775 if (data->initted)
+ − 2776 Lstream_close (XLSTREAM (data->lstreams[0]));
+ − 2777 return n;
+ − 2778 }
+ − 2779
+ − 2780 if (!data->initted)
+ − 2781 chain_init_coding_streams_1
+ − 2782 (data, dst, XCODING_SYSTEM_CHAIN_COUNT (str->codesys),
+ − 2783 XCODING_SYSTEM_CHAIN_CHAIN (str->codesys), str->direction);
+ − 2784
+ − 2785 if (Lstream_write (XLSTREAM (data->lstreams[0]), src, n) < 0)
+ − 2786 return -1;
+ − 2787 return n;
+ − 2788 }
+ − 2789
+ − 2790 static void
+ − 2791 chain_finalize_coding_stream_1 (struct chain_coding_stream *data)
+ − 2792 {
+ − 2793 if (data->lstreams)
+ − 2794 {
+ − 2795 /* Order of deletion is important here! Delete from the head of the
+ − 2796 chain and work your way towards the tail. In general, when you
+ − 2797 delete an object, there should be *NO* pointers to it anywhere.
+ − 2798 Deleting back-to-front would be a problem because there are
+ − 2799 pointers going forward. If there were pointers in both
+ − 2800 directions, you'd have to disconnect the pointers to a particular
+ − 2801 object before deleting it. */
+ − 2802 if (!gc_in_progress)
+ − 2803 {
+ − 2804 int i;
+ − 2805 /* During GC, these objects are unmarked, and are about to be
+ − 2806 freed. We do NOT want them on the free list, and that will
+ − 2807 cause lots of nastiness including crashes. Just let them be
+ − 2808 freed normally. */
+ − 2809 for (i = 0; i < data->lstream_count; i++)
+ − 2810 Lstream_delete (XLSTREAM ((data->lstreams)[i]));
+ − 2811 }
+ − 2812 xfree (data->lstreams);
+ − 2813 }
+ − 2814 }
+ − 2815
+ − 2816 static void
+ − 2817 chain_finalize_coding_stream (struct coding_stream *str)
+ − 2818 {
+ − 2819 chain_finalize_coding_stream_1 (CODING_STREAM_TYPE_DATA (str, chain));
+ − 2820 }
+ − 2821
+ − 2822 static void
+ − 2823 chain_finalize (Lisp_Object c)
+ − 2824 {
+ − 2825 if (XCODING_SYSTEM_CHAIN_CHAIN (c))
+ − 2826 xfree (XCODING_SYSTEM_CHAIN_CHAIN (c));
+ − 2827 }
+ − 2828
428
+ − 2829 static int
771
+ − 2830 chain_putprop (Lisp_Object codesys, Lisp_Object key, Lisp_Object value)
+ − 2831 {
+ − 2832 if (EQ (key, Qchain))
+ − 2833 {
+ − 2834 Lisp_Object tail;
+ − 2835 Lisp_Object *cslist;
+ − 2836 int count = 0;
+ − 2837 int i;
+ − 2838
+ − 2839 EXTERNAL_LIST_LOOP (tail, value)
+ − 2840 {
+ − 2841 Fget_coding_system (XCAR (tail));
+ − 2842 count++;
+ − 2843 }
+ − 2844
+ − 2845 cslist = xnew_array (Lisp_Object, count);
+ − 2846 XCODING_SYSTEM_CHAIN_CHAIN (codesys) = cslist;
+ − 2847
+ − 2848 count = 0;
+ − 2849 EXTERNAL_LIST_LOOP (tail, value)
+ − 2850 {
+ − 2851 cslist[count] = Fget_coding_system (XCAR (tail));
+ − 2852 count++;
+ − 2853 }
+ − 2854
+ − 2855 XCODING_SYSTEM_CHAIN_COUNT (codesys) = count;
+ − 2856
+ − 2857 for (i = 0; i < count - 1; i++)
+ − 2858 {
+ − 2859 if (decoding_source_sink_type_is_char (cslist[i], CODING_SINK) !=
+ − 2860 decoding_source_sink_type_is_char (cslist[i + 1], CODING_SOURCE))
+ − 2861 invalid_argument_2 ("Sink of first must match source of second",
+ − 2862 cslist[i], cslist[i + 1]);
+ − 2863 }
+ − 2864 }
+ − 2865 else if (EQ (key, Qcanonicalize_after_coding))
+ − 2866 XCODING_SYSTEM_CHAIN_CANONICALIZE_AFTER_CODING (codesys) =
+ − 2867 Fget_coding_system (value);
+ − 2868 else
+ − 2869 return 0;
+ − 2870 return 1;
+ − 2871 }
+ − 2872
+ − 2873 static Lisp_Object
+ − 2874 chain_getprop (Lisp_Object coding_system, Lisp_Object prop)
+ − 2875 {
+ − 2876 if (EQ (prop, Qchain))
+ − 2877 {
+ − 2878 Lisp_Object result = Qnil;
+ − 2879 int i;
+ − 2880
+ − 2881 for (i = 0; i < XCODING_SYSTEM_CHAIN_COUNT (coding_system); i++)
+ − 2882 result = Fcons (XCODING_SYSTEM_CHAIN_CHAIN (coding_system)[i],
+ − 2883 result);
+ − 2884
+ − 2885 return Fnreverse (result);
+ − 2886 }
+ − 2887 else if (EQ (prop, Qcanonicalize_after_coding))
+ − 2888 return XCODING_SYSTEM_CHAIN_CANONICALIZE_AFTER_CODING (coding_system);
+ − 2889 else
+ − 2890 return Qunbound;
+ − 2891 }
+ − 2892
+ − 2893 static enum source_sink_type
+ − 2894 chain_conversion_end_type (Lisp_Object codesys)
+ − 2895 {
+ − 2896 Lisp_Object *cslist = XCODING_SYSTEM_CHAIN_CHAIN (codesys);
+ − 2897 int n = XCODING_SYSTEM_CHAIN_COUNT (codesys);
+ − 2898 int charp_source, charp_sink;
+ − 2899
+ − 2900 if (n == 0)
+ − 2901 return DECODES_BYTE_TO_BYTE; /* arbitrary */
+ − 2902 charp_source = decoding_source_sink_type_is_char (cslist[0], CODING_SOURCE);
+ − 2903 charp_sink = decoding_source_sink_type_is_char (cslist[n - 1], CODING_SINK);
+ − 2904
+ − 2905 switch (charp_source * 2 + charp_sink)
+ − 2906 {
+ − 2907 case 0: return DECODES_BYTE_TO_BYTE;
+ − 2908 case 1: return DECODES_BYTE_TO_CHARACTER;
+ − 2909 case 2: return DECODES_CHARACTER_TO_BYTE;
+ − 2910 case 3: return DECODES_CHARACTER_TO_CHARACTER;
+ − 2911 }
+ − 2912
+ − 2913 abort ();
+ − 2914 return DECODES_BYTE_TO_BYTE;
+ − 2915 }
+ − 2916
+ − 2917
+ − 2918 /************************************************************************/
+ − 2919 /* No-conversion methods */
+ − 2920 /************************************************************************/
+ − 2921
+ − 2922 /* "No conversion"; used for binary files. We use quotes because there
+ − 2923 really is some conversion being applied (it does byte<->char
+ − 2924 conversion), but it appears to the user as if the text is read in
+ − 2925 without conversion. */
+ − 2926 DEFINE_CODING_SYSTEM_TYPE (no_conversion);
+ − 2927
+ − 2928 /* This is used when reading in "binary" files -- i.e. files that may
+ − 2929 contain all 256 possible byte values and that are not to be
+ − 2930 interpreted as being in any particular encoding. */
+ − 2931 static Bytecount
+ − 2932 no_conversion_convert (struct coding_stream *str,
+ − 2933 const UExtbyte *src,
+ − 2934 unsigned_char_dynarr *dst, Bytecount n)
+ − 2935 {
+ − 2936 UExtbyte c;
+ − 2937 unsigned int ch = str->ch;
+ − 2938 Bytecount orign = n;
+ − 2939
+ − 2940 if (str->direction == CODING_DECODE)
+ − 2941 {
+ − 2942 while (n--)
+ − 2943 {
+ − 2944 c = *src++;
+ − 2945
+ − 2946 DECODE_ADD_BINARY_CHAR (c, dst);
+ − 2947 }
+ − 2948
+ − 2949 if (str->eof)
+ − 2950 DECODE_OUTPUT_PARTIAL_CHAR (ch, dst);
+ − 2951 }
+ − 2952 else
+ − 2953 {
+ − 2954
+ − 2955 while (n--)
+ − 2956 {
+ − 2957 c = *src++;
826
+ − 2958 if (byte_ascii_p (c))
771
+ − 2959 {
+ − 2960 assert (ch == 0);
+ − 2961 Dynarr_add (dst, c);
+ − 2962 }
+ − 2963 #ifdef MULE
867
+ − 2964 else if (ibyte_leading_byte_p (c))
771
+ − 2965 {
+ − 2966 assert (ch == 0);
+ − 2967 if (c == LEADING_BYTE_LATIN_ISO8859_1 ||
+ − 2968 c == LEADING_BYTE_CONTROL_1)
+ − 2969 ch = c;
+ − 2970 else
+ − 2971 Dynarr_add (dst, '~'); /* untranslatable character */
+ − 2972 }
+ − 2973 else
+ − 2974 {
+ − 2975 if (ch == LEADING_BYTE_LATIN_ISO8859_1)
+ − 2976 Dynarr_add (dst, c);
+ − 2977 else if (ch == LEADING_BYTE_CONTROL_1)
+ − 2978 {
+ − 2979 assert (c < 0xC0);
+ − 2980 Dynarr_add (dst, c - 0x20);
+ − 2981 }
+ − 2982 /* else it should be the second or third byte of an
+ − 2983 untranslatable character, so ignore it */
+ − 2984 ch = 0;
+ − 2985 }
+ − 2986 #endif /* MULE */
+ − 2987
+ − 2988 }
+ − 2989 }
+ − 2990
+ − 2991 str->ch = ch;
+ − 2992 return orign;
+ − 2993 }
+ − 2994
+ − 2995 DEFINE_DETECTOR (no_conversion);
+ − 2996 DEFINE_DETECTOR_CATEGORY (no_conversion, no_conversion);
+ − 2997
+ − 2998 struct no_conversion_detector
+ − 2999 {
+ − 3000 int dummy;
+ − 3001 };
+ − 3002
+ − 3003 static void
+ − 3004 no_conversion_detect (struct detection_state *st, const UExtbyte *src,
+ − 3005 Bytecount n)
+ − 3006 {
+ − 3007 /* Hack until we get better handling of this stuff! */
+ − 3008 DET_RESULT (st, no_conversion) = DET_SLIGHTLY_LIKELY;
+ − 3009 }
+ − 3010
+ − 3011
+ − 3012 /************************************************************************/
+ − 3013 /* Convert-eol methods */
+ − 3014 /************************************************************************/
+ − 3015
+ − 3016 /* This is used to handle end-of-line (EOL) differences. It is
+ − 3017 character-to-character, and works (when encoding) *BEFORE* sending
+ − 3018 data to the main encoding routine -- thus, that routine must handle
+ − 3019 different EOL types itself if it does line-oriented type processing.
+ − 3020 This is unavoidable because we don't know whether the output of the
+ − 3021 main encoding routine is ASCII compatible (Unicode is definitely not,
+ − 3022 for example).
+ − 3023
793
+ − 3024 There is one parameter: `subtype', either `cr', `lf', `crlf', or nil.
771
+ − 3025 */
+ − 3026
+ − 3027 DEFINE_CODING_SYSTEM_TYPE (convert_eol);
+ − 3028
+ − 3029 struct convert_eol_coding_system
+ − 3030 {
+ − 3031 enum eol_type subtype;
+ − 3032 };
+ − 3033
+ − 3034 #define CODING_SYSTEM_CONVERT_EOL_SUBTYPE(codesys) \
+ − 3035 (CODING_SYSTEM_TYPE_DATA (codesys, convert_eol)->subtype)
+ − 3036 #define XCODING_SYSTEM_CONVERT_EOL_SUBTYPE(codesys) \
+ − 3037 (XCODING_SYSTEM_TYPE_DATA (codesys, convert_eol)->subtype)
+ − 3038
+ − 3039 struct convert_eol_coding_stream
+ − 3040 {
+ − 3041 enum eol_type actual;
+ − 3042 };
+ − 3043
+ − 3044 static const struct lrecord_description
+ − 3045 convert_eol_coding_system_description[] = {
+ − 3046 { XD_END }
+ − 3047 };
+ − 3048
+ − 3049 static void
+ − 3050 convert_eol_print (Lisp_Object cs, Lisp_Object printcharfun, int escapeflag)
+ − 3051 {
+ − 3052 struct convert_eol_coding_system *data =
+ − 3053 XCODING_SYSTEM_TYPE_DATA (cs, convert_eol);
+ − 3054
+ − 3055 write_fmt_string (printcharfun, "(%s)",
+ − 3056 data->subtype == EOL_LF ? "lf" :
+ − 3057 data->subtype == EOL_CRLF ? "crlf" :
+ − 3058 data->subtype == EOL_CR ? "cr" :
793
+ − 3059 data->subtype == EOL_AUTODETECT ? "nil" :
771
+ − 3060 (abort(), ""));
+ − 3061 }
+ − 3062
+ − 3063 static enum source_sink_type
+ − 3064 convert_eol_conversion_end_type (Lisp_Object codesys)
+ − 3065 {
+ − 3066 return DECODES_CHARACTER_TO_CHARACTER;
+ − 3067 }
+ − 3068
+ − 3069 static int
+ − 3070 convert_eol_putprop (Lisp_Object codesys,
+ − 3071 Lisp_Object key,
+ − 3072 Lisp_Object value)
+ − 3073 {
+ − 3074 struct convert_eol_coding_system *data =
+ − 3075 XCODING_SYSTEM_TYPE_DATA (codesys, convert_eol);
+ − 3076
+ − 3077 if (EQ (key, Qsubtype))
+ − 3078 {
+ − 3079 if (EQ (value, Qlf) /* || EQ (value, Qunix) */)
+ − 3080 data->subtype = EOL_LF;
+ − 3081 else if (EQ (value, Qcrlf) /* || EQ (value, Qdos) */)
+ − 3082 data->subtype = EOL_CRLF;
+ − 3083 else if (EQ (value, Qcr) /* || EQ (value, Qmac) */)
+ − 3084 data->subtype = EOL_CR;
793
+ − 3085 else if (EQ (value, Qnil))
771
+ − 3086 data->subtype = EOL_AUTODETECT;
+ − 3087 else invalid_constant ("Unrecognized eol type", value);
+ − 3088 }
+ − 3089 else
+ − 3090 return 0;
+ − 3091 return 1;
+ − 3092 }
+ − 3093
+ − 3094 static Lisp_Object
+ − 3095 convert_eol_getprop (Lisp_Object coding_system, Lisp_Object prop)
+ − 3096 {
+ − 3097 struct convert_eol_coding_system *data =
+ − 3098 XCODING_SYSTEM_TYPE_DATA (coding_system, convert_eol);
+ − 3099
+ − 3100 if (EQ (prop, Qsubtype))
+ − 3101 {
+ − 3102 switch (data->subtype)
+ − 3103 {
+ − 3104 case EOL_LF: return Qlf;
+ − 3105 case EOL_CRLF: return Qcrlf;
+ − 3106 case EOL_CR: return Qcr;
793
+ − 3107 case EOL_AUTODETECT: return Qnil;
771
+ − 3108 default: abort ();
+ − 3109 }
+ − 3110 }
+ − 3111
+ − 3112 return Qunbound;
+ − 3113 }
+ − 3114
+ − 3115 static void
+ − 3116 convert_eol_init_coding_stream (struct coding_stream *str)
+ − 3117 {
+ − 3118 struct convert_eol_coding_stream *data =
+ − 3119 CODING_STREAM_TYPE_DATA (str, convert_eol);
+ − 3120 data->actual = XCODING_SYSTEM_CONVERT_EOL_SUBTYPE (str->codesys);
+ − 3121 }
+ − 3122
+ − 3123 static Bytecount
867
+ − 3124 convert_eol_convert (struct coding_stream *str, const Ibyte *src,
771
+ − 3125 unsigned_char_dynarr *dst, Bytecount n)
+ − 3126 {
+ − 3127 if (str->direction == CODING_DECODE)
+ − 3128 {
+ − 3129 struct convert_eol_coding_stream *data =
+ − 3130 CODING_STREAM_TYPE_DATA (str, convert_eol);
+ − 3131
+ − 3132 if (data->actual == EOL_AUTODETECT)
+ − 3133 {
+ − 3134 Bytecount n2 = n;
867
+ − 3135 const Ibyte *src2 = src;
771
+ − 3136
+ − 3137 for (; n2; n2--)
+ − 3138 {
867
+ − 3139 Ibyte c = *src2++;
771
+ − 3140 if (c == '\n')
+ − 3141 {
+ − 3142 data->actual = EOL_LF;
+ − 3143 break;
+ − 3144 }
+ − 3145 else if (c == '\r')
+ − 3146 {
+ − 3147 if (n2 == 1)
+ − 3148 {
+ − 3149 /* If we're seeing a '\r' at the end of the data, then
+ − 3150 reject the '\r' right now so it doesn't become an
+ − 3151 issue in the code below -- unless we're at the end of
+ − 3152 the stream, in which case we can't do that (because
+ − 3153 then the '\r' will never get written out), and in any
+ − 3154 case we should be recognizing it at EOL_CR format. */
+ − 3155 if (str->eof)
+ − 3156 data->actual = EOL_CR;
+ − 3157 else
+ − 3158 n--;
+ − 3159 break;
+ − 3160 }
+ − 3161 else if (*src2 == '\n')
+ − 3162 data->actual = EOL_CRLF;
+ − 3163 else
+ − 3164 data->actual = EOL_CR;
+ − 3165 break;
+ − 3166 }
+ − 3167 }
+ − 3168 }
+ − 3169
+ − 3170 /* str->eof is set, the caller reached EOF on the other end and has
+ − 3171 no new data to give us. The only data we get is the data we
+ − 3172 rejected from last time. */
+ − 3173 if (data->actual == EOL_LF || data->actual == EOL_AUTODETECT ||
+ − 3174 (str->eof))
+ − 3175 Dynarr_add_many (dst, src, n);
+ − 3176 else
+ − 3177 {
867
+ − 3178 const Ibyte *end = src + n;
771
+ − 3179 while (1)
+ − 3180 {
+ − 3181 /* Find the next section with no \r and add it. */
867
+ − 3182 const Ibyte *runstart = src;
+ − 3183 src = (Ibyte *) memchr (src, '\r', end - src);
771
+ − 3184 if (!src)
+ − 3185 src = end;
+ − 3186 Dynarr_add_many (dst, runstart, src - runstart);
+ − 3187 /* Stop if at end ... */
+ − 3188 if (src == end)
+ − 3189 break;
+ − 3190 /* ... else, translate as necessary. */
+ − 3191 src++;
+ − 3192 if (data->actual == EOL_CR)
+ − 3193 Dynarr_add (dst, '\n');
+ − 3194 /* We need to be careful here with CRLF. If we see a CR at the
+ − 3195 end of the data, we don't know if it's part of a CRLF, so we
+ − 3196 reject it. Otherwise: If it's part of a CRLF, eat it and
+ − 3197 loop; the following LF gets added next time around. If it's
+ − 3198 not part of a CRLF, add the CR and loop. The following
+ − 3199 character will be processed in the next loop iteration. This
+ − 3200 correctly handles a sequence like CR+CR+LF. */
+ − 3201 else if (src == end)
+ − 3202 return n - 1; /* reject the CR at the end; we'll get it again
+ − 3203 next time the convert method is called */
+ − 3204 else if (*src != '\n')
+ − 3205 Dynarr_add (dst, '\r');
+ − 3206 }
+ − 3207 }
+ − 3208
+ − 3209 return n;
+ − 3210 }
+ − 3211 else
+ − 3212 {
+ − 3213 enum eol_type subtype =
+ − 3214 XCODING_SYSTEM_CONVERT_EOL_SUBTYPE (str->codesys);
867
+ − 3215 const Ibyte *end = src + n;
771
+ − 3216
+ − 3217 /* We try to be relatively efficient here. */
+ − 3218 if (subtype == EOL_LF)
+ − 3219 Dynarr_add_many (dst, src, n);
+ − 3220 else
+ − 3221 {
+ − 3222 while (1)
+ − 3223 {
+ − 3224 /* Find the next section with no \n and add it. */
867
+ − 3225 const Ibyte *runstart = src;
+ − 3226 src = (Ibyte *) memchr (src, '\n', end - src);
771
+ − 3227 if (!src)
+ − 3228 src = end;
+ − 3229 Dynarr_add_many (dst, runstart, src - runstart);
+ − 3230 /* Stop if at end ... */
+ − 3231 if (src == end)
+ − 3232 break;
+ − 3233 /* ... else, skip over \n and add its translation. */
+ − 3234 src++;
+ − 3235 Dynarr_add (dst, '\r');
+ − 3236 if (subtype == EOL_CRLF)
+ − 3237 Dynarr_add (dst, '\n');
+ − 3238 }
+ − 3239 }
+ − 3240
+ − 3241 return n;
+ − 3242 }
+ − 3243 }
+ − 3244
+ − 3245 static Lisp_Object
+ − 3246 convert_eol_canonicalize_after_coding (struct coding_stream *str)
+ − 3247 {
+ − 3248 struct convert_eol_coding_stream *data =
+ − 3249 CODING_STREAM_TYPE_DATA (str, convert_eol);
+ − 3250
+ − 3251 if (str->direction == CODING_ENCODE)
+ − 3252 return str->codesys;
+ − 3253
+ − 3254 switch (data->actual)
+ − 3255 {
+ − 3256 case EOL_LF: return Fget_coding_system (Qconvert_eol_lf);
+ − 3257 case EOL_CRLF: return Fget_coding_system (Qconvert_eol_crlf);
+ − 3258 case EOL_CR: return Fget_coding_system (Qconvert_eol_cr);
+ − 3259 case EOL_AUTODETECT: return str->codesys;
+ − 3260 default: abort (); return Qnil;
+ − 3261 }
+ − 3262 }
+ − 3263
+ − 3264
+ − 3265 /************************************************************************/
+ − 3266 /* Undecided methods */
+ − 3267 /************************************************************************/
+ − 3268
+ − 3269 /* Do autodetection. We can autodetect the EOL type only, the coding
+ − 3270 system only, or both. We only do autodetection when decoding; when
+ − 3271 encoding, we just pass the data through.
+ − 3272
+ − 3273 When doing just EOL detection, a coding system can be specified; if so,
+ − 3274 we will decode this data through the coding system before doing EOL
+ − 3275 detection. The reason for specifying this is so that
+ − 3276 canonicalize-after-coding works: We will canonicalize the specified
+ − 3277 coding system into the appropriate EOL type. When doing both coding and
+ − 3278 EOL detection, we do similar canonicalization, and also catch situations
+ − 3279 where the EOL type is overspecified, i.e. the detected coding system
+ − 3280 specifies an EOL type, and either switch to the equivalent
+ − 3281 non-EOL-processing coding system (if possible), or terminate EOL
+ − 3282 detection and use the specified EOL type. This prevents data from being
+ − 3283 EOL-processed twice.
+ − 3284 */
+ − 3285
+ − 3286 DEFINE_CODING_SYSTEM_TYPE (undecided);
+ − 3287
+ − 3288 struct undecided_coding_system
+ − 3289 {
+ − 3290 int do_eol, do_coding;
+ − 3291 Lisp_Object cs;
+ − 3292 };
+ − 3293
+ − 3294 struct undecided_coding_stream
+ − 3295 {
+ − 3296 Lisp_Object actual;
+ − 3297 /* Either 2 or 3 lstreams here; see undecided_convert */
+ − 3298 struct chain_coding_stream c;
+ − 3299
+ − 3300 struct detection_state *st;
+ − 3301 };
+ − 3302
+ − 3303 static const struct lrecord_description
+ − 3304 undecided_coding_system_description[] = {
+ − 3305 { XD_LISP_OBJECT,
+ − 3306 coding_system_data_offset + offsetof (struct undecided_coding_system,
+ − 3307 cs) },
+ − 3308 { XD_END }
+ − 3309 };
+ − 3310
+ − 3311 static void
+ − 3312 undecided_init (Lisp_Object codesys)
+ − 3313 {
+ − 3314 struct undecided_coding_system *data =
+ − 3315 XCODING_SYSTEM_TYPE_DATA (codesys, undecided);
+ − 3316
+ − 3317 data->cs = Qnil;
+ − 3318 }
+ − 3319
+ − 3320 static void
+ − 3321 undecided_mark (Lisp_Object codesys)
+ − 3322 {
+ − 3323 struct undecided_coding_system *data =
+ − 3324 XCODING_SYSTEM_TYPE_DATA (codesys, undecided);
+ − 3325
+ − 3326 mark_object (data->cs);
+ − 3327 }
+ − 3328
+ − 3329 static void
+ − 3330 undecided_print (Lisp_Object cs, Lisp_Object printcharfun, int escapeflag)
+ − 3331 {
+ − 3332 struct undecided_coding_system *data =
+ − 3333 XCODING_SYSTEM_TYPE_DATA (cs, undecided);
+ − 3334 int need_space = 0;
+ − 3335
826
+ − 3336 write_c_string (printcharfun, "(");
771
+ − 3337 if (data->do_eol)
+ − 3338 {
826
+ − 3339 write_c_string (printcharfun, "do-eol");
771
+ − 3340 need_space = 1;
+ − 3341 }
+ − 3342 if (data->do_coding)
+ − 3343 {
+ − 3344 if (need_space)
826
+ − 3345 write_c_string (printcharfun, " ");
+ − 3346 write_c_string (printcharfun, "do-coding");
771
+ − 3347 need_space = 1;
+ − 3348 }
+ − 3349 if (!NILP (data->cs))
+ − 3350 {
+ − 3351 if (need_space)
826
+ − 3352 write_c_string (printcharfun, " ");
+ − 3353 write_c_string (printcharfun, "coding-system=");
771
+ − 3354 print_coding_system_in_print_method (data->cs, printcharfun, escapeflag);
+ − 3355 }
826
+ − 3356 write_c_string (printcharfun, ")");
771
+ − 3357 }
+ − 3358
+ − 3359 static void
+ − 3360 undecided_mark_coding_stream (struct coding_stream *str)
+ − 3361 {
+ − 3362 chain_mark_coding_stream_1 (&CODING_STREAM_TYPE_DATA (str, undecided)->c);
+ − 3363 }
+ − 3364
+ − 3365 static int
+ − 3366 undecided_putprop (Lisp_Object codesys, Lisp_Object key, Lisp_Object value)
+ − 3367 {
+ − 3368 struct undecided_coding_system *data =
+ − 3369 XCODING_SYSTEM_TYPE_DATA (codesys, undecided);
+ − 3370
+ − 3371 if (EQ (key, Qdo_eol))
+ − 3372 data->do_eol = 1;
+ − 3373 else if (EQ (key, Qdo_coding))
+ − 3374 data->do_coding = 1;
+ − 3375 else if (EQ (key, Qcoding_system))
+ − 3376 data->cs = get_coding_system_for_text_file (value, 0);
+ − 3377 else
+ − 3378 return 0;
+ − 3379 return 1;
+ − 3380 }
+ − 3381
+ − 3382 static Lisp_Object
+ − 3383 undecided_getprop (Lisp_Object codesys, Lisp_Object prop)
+ − 3384 {
+ − 3385 struct undecided_coding_system *data =
+ − 3386 XCODING_SYSTEM_TYPE_DATA (codesys, undecided);
+ − 3387
+ − 3388 if (EQ (prop, Qdo_eol))
+ − 3389 return data->do_eol ? Qt : Qnil;
+ − 3390 if (EQ (prop, Qdo_coding))
+ − 3391 return data->do_coding ? Qt : Qnil;
+ − 3392 if (EQ (prop, Qcoding_system))
+ − 3393 return data->cs;
+ − 3394 return Qunbound;
+ − 3395 }
+ − 3396
+ − 3397 static struct detection_state *
+ − 3398 allocate_detection_state (void)
+ − 3399 {
+ − 3400 int i;
+ − 3401 Bytecount size = MAX_ALIGN_SIZE (sizeof (struct detection_state));
+ − 3402 struct detection_state *block;
+ − 3403
+ − 3404 for (i = 0; i < coding_detector_count; i++)
+ − 3405 size += MAX_ALIGN_SIZE (Dynarr_at (all_coding_detectors, i).data_size);
+ − 3406
+ − 3407 block = (struct detection_state *) xmalloc_and_zero (size);
+ − 3408
+ − 3409 size = MAX_ALIGN_SIZE (sizeof (struct detection_state));
+ − 3410 for (i = 0; i < coding_detector_count; i++)
+ − 3411 {
+ − 3412 block->data_offset[i] = size;
+ − 3413 size += MAX_ALIGN_SIZE (Dynarr_at (all_coding_detectors, i).data_size);
+ − 3414 }
+ − 3415
+ − 3416 return block;
+ − 3417 }
+ − 3418
+ − 3419 static void
+ − 3420 free_detection_state (struct detection_state *st)
+ − 3421 {
+ − 3422 int i;
+ − 3423
+ − 3424 for (i = 0; i < coding_detector_count; i++)
+ − 3425 {
+ − 3426 if (Dynarr_at (all_coding_detectors, i).finalize_detection_state_method)
+ − 3427 Dynarr_at (all_coding_detectors, i).finalize_detection_state_method
+ − 3428 (st);
+ − 3429 }
+ − 3430
+ − 3431 xfree (st);
+ − 3432 }
+ − 3433
+ − 3434 static int
+ − 3435 coding_category_symbol_to_id (Lisp_Object symbol)
428
+ − 3436 {
+ − 3437 int i;
+ − 3438
+ − 3439 CHECK_SYMBOL (symbol);
771
+ − 3440 for (i = 0; i < coding_detector_count; i++)
+ − 3441 {
+ − 3442 detector_category_dynarr *cats =
+ − 3443 Dynarr_at (all_coding_detectors, i).cats;
+ − 3444 int j;
+ − 3445
+ − 3446 for (j = 0; j < Dynarr_length (cats); j++)
+ − 3447 if (EQ (Dynarr_at (cats, j).sym, symbol))
+ − 3448 return Dynarr_at (cats, j).id;
+ − 3449 }
+ − 3450
563
+ − 3451 invalid_constant ("Unrecognized coding category", symbol);
801
+ − 3452 RETURN_NOT_REACHED (0)
428
+ − 3453 }
+ − 3454
771
+ − 3455 static Lisp_Object
+ − 3456 coding_category_id_to_symbol (int id)
428
+ − 3457 {
+ − 3458 int i;
771
+ − 3459
+ − 3460 for (i = 0; i < coding_detector_count; i++)
+ − 3461 {
+ − 3462 detector_category_dynarr *cats =
+ − 3463 Dynarr_at (all_coding_detectors, i).cats;
+ − 3464 int j;
+ − 3465
+ − 3466 for (j = 0; j < Dynarr_length (cats); j++)
+ − 3467 if (id == Dynarr_at (cats, j).id)
+ − 3468 return Dynarr_at (cats, j).sym;
+ − 3469 }
+ − 3470
+ − 3471 abort ();
+ − 3472 return Qnil; /* (usually) not reached */
428
+ − 3473 }
+ − 3474
771
+ − 3475 static Lisp_Object
+ − 3476 detection_result_number_to_symbol (enum detection_result result)
428
+ − 3477 {
771
+ − 3478 #define FROB(sym, num) if (result == num) return (sym)
+ − 3479 FROB (Qnear_certainty, DET_NEAR_CERTAINTY);
+ − 3480 FROB (Qquite_probable, DET_QUITE_PROBABLE);
+ − 3481 FROB (Qsomewhat_likely, DET_SOMEWHAT_LIKELY);
+ − 3482 FROB (Qas_likely_as_unlikely, DET_AS_LIKELY_AS_UNLIKELY);
+ − 3483 FROB (Qsomewhat_unlikely, DET_SOMEWHAT_UNLIKELY);
+ − 3484 FROB (Qquite_improbable, DET_QUITE_IMPROBABLE);
+ − 3485 FROB (Qnearly_impossible, DET_NEARLY_IMPOSSIBLE);
+ − 3486 #undef FROB
+ − 3487
+ − 3488 abort ();
+ − 3489 return Qnil; /* (usually) not reached */
+ − 3490 }
+ − 3491
778
+ − 3492 #if 0 /* not used */
771
+ − 3493 static enum detection_result
+ − 3494 detection_result_symbol_to_number (Lisp_Object symbol)
+ − 3495 {
+ − 3496 #define FROB(sym, num) if (EQ (symbol, sym)) return (num)
+ − 3497 FROB (Qnear_certainty, DET_NEAR_CERTAINTY);
+ − 3498 FROB (Qquite_probable, DET_QUITE_PROBABLE);
+ − 3499 FROB (Qsomewhat_likely, DET_SOMEWHAT_LIKELY);
+ − 3500 FROB (Qas_likely_as_unlikely, DET_AS_LIKELY_AS_UNLIKELY);
+ − 3501 FROB (Qsomewhat_unlikely, DET_SOMEWHAT_UNLIKELY);
+ − 3502 FROB (Qquite_improbable, DET_QUITE_IMPROBABLE);
+ − 3503 FROB (Qnearly_impossible, DET_NEARLY_IMPOSSIBLE);
+ − 3504 #undef FROB
+ − 3505
+ − 3506 invalid_constant ("Unrecognized detection result", symbol);
+ − 3507 return ((enum detection_result) 0); /* not reached */
+ − 3508 }
778
+ − 3509 #endif /* 0 */
771
+ − 3510
+ − 3511 /* Set all detection results for a given detector to a specified value. */
+ − 3512 void
+ − 3513 set_detection_results (struct detection_state *st, int detector, int given)
+ − 3514 {
+ − 3515 detector_category_dynarr *cats =
+ − 3516 Dynarr_at (all_coding_detectors, detector).cats;
+ − 3517 int i;
+ − 3518
+ − 3519 for (i = 0; i < Dynarr_length (cats); i++)
+ − 3520 st->categories[Dynarr_at (cats, i).id] = given;
+ − 3521 }
428
+ − 3522
+ − 3523 static int
+ − 3524 acceptable_control_char_p (int c)
+ − 3525 {
+ − 3526 switch (c)
+ − 3527 {
+ − 3528 /* Allow and ignore control characters that you might
+ − 3529 reasonably see in a text file */
+ − 3530 case '\r':
+ − 3531 case '\n':
+ − 3532 case '\t':
+ − 3533 case 7: /* bell */
+ − 3534 case 8: /* backspace */
+ − 3535 case 11: /* vertical tab */
+ − 3536 case 12: /* form feed */
+ − 3537 case 26: /* MS-DOS C-z junk */
+ − 3538 case 31: /* '^_' -- for info */
+ − 3539 return 1;
+ − 3540 default:
+ − 3541 return 0;
+ − 3542 }
+ − 3543 }
+ − 3544
771
+ − 3545 #ifdef DEBUG_XEMACS
+ − 3546
+ − 3547 static UExtbyte
+ − 3548 hex_digit_to_char (int digit)
428
+ − 3549 {
771
+ − 3550 if (digit < 10)
+ − 3551 return digit + '0';
+ − 3552 else
+ − 3553 return digit - 10 + 'A';
428
+ − 3554 }
+ − 3555
771
+ − 3556 static void
+ − 3557 output_bytes_in_ascii_and_hex (const UExtbyte *src, Bytecount n)
428
+ − 3558 {
771
+ − 3559 UExtbyte *ascii = alloca_array (UExtbyte, n + 1);
+ − 3560 UExtbyte *hex = alloca_array (UExtbyte, 3 * n + 1);
+ − 3561 int i;
+ − 3562
+ − 3563 for (i = 0; i < n; i++)
428
+ − 3564 {
771
+ − 3565 UExtbyte c = src[i];
+ − 3566 if (c < 0x20)
+ − 3567 ascii[i] = '.';
428
+ − 3568 else
771
+ − 3569 ascii[i] = c;
+ − 3570 hex[3 * i] = hex_digit_to_char (c >> 4);
+ − 3571 hex[3 * i + 1] = hex_digit_to_char (c & 0xF);
+ − 3572 hex[3 * i + 2] = ' ';
428
+ − 3573 }
771
+ − 3574 ascii[i] = '\0';
+ − 3575 hex[3 * i - 1] = '\0';
+ − 3576 stderr_out ("%s %s", ascii, hex);
428
+ − 3577 }
+ − 3578
771
+ − 3579 #endif /* DEBUG_XEMACS */
+ − 3580
+ − 3581 /* Attempt to determine the encoding of the given text. Before calling
+ − 3582 this function for the first time, you must zero out the detection state.
428
+ − 3583
+ − 3584 Returns:
+ − 3585
771
+ − 3586 0 == keep going
+ − 3587 1 == stop
428
+ − 3588 */
+ − 3589
+ − 3590 static int
771
+ − 3591 detect_coding_type (struct detection_state *st, const UExtbyte *src,
+ − 3592 Bytecount n)
428
+ − 3593 {
771
+ − 3594 Bytecount n2 = n;
+ − 3595 const UExtbyte *src2 = src;
+ − 3596 int i;
+ − 3597
+ − 3598 #ifdef DEBUG_XEMACS
+ − 3599 if (!NILP (Vdebug_coding_detection))
+ − 3600 {
+ − 3601 int bytes = min (16, n);
+ − 3602 stderr_out ("detect_coding_type: processing %ld bytes\n", n);
+ − 3603 stderr_out ("First %d: ", bytes);
+ − 3604 output_bytes_in_ascii_and_hex (src, bytes);
+ − 3605 stderr_out ("\nLast %d: ", bytes);
+ − 3606 output_bytes_in_ascii_and_hex (src + n - bytes, bytes);
+ − 3607 stderr_out ("\n");
+ − 3608 }
+ − 3609 #endif /* DEBUG_XEMACS */
428
+ − 3610 if (!st->seen_non_ascii)
+ − 3611 {
771
+ − 3612 for (; n2; n2--, src2++)
428
+ − 3613 {
771
+ − 3614 UExtbyte c = *src2;
428
+ − 3615 if ((c < 0x20 && !acceptable_control_char_p (c)) || c >= 0x80)
+ − 3616 {
+ − 3617 st->seen_non_ascii = 1;
+ − 3618 break;
+ − 3619 }
+ − 3620 }
+ − 3621 }
+ − 3622
771
+ − 3623 for (i = 0; i < coding_detector_count; i++)
+ − 3624 Dynarr_at (all_coding_detectors, i).detect_method (st, src, n);
+ − 3625
+ − 3626 st->bytes_seen += n;
+ − 3627
+ − 3628 #ifdef DEBUG_XEMACS
+ − 3629 if (!NILP (Vdebug_coding_detection))
+ − 3630 {
+ − 3631 stderr_out ("seen_non_ascii: %d\n", st->seen_non_ascii);
+ − 3632 for (i = 0; i < coding_detector_category_count; i++)
+ − 3633 stderr_out_lisp
+ − 3634 ("%s: %s\n",
+ − 3635 2,
+ − 3636 coding_category_id_to_symbol (i),
+ − 3637 detection_result_number_to_symbol ((enum detection_result)
+ − 3638 st->categories[i]));
+ − 3639 }
+ − 3640 #endif /* DEBUG_XEMACS */
+ − 3641
+ − 3642 {
+ − 3643 int not_unlikely = 0;
+ − 3644 int retval;
+ − 3645
+ − 3646 for (i = 0; i < coding_detector_category_count; i++)
+ − 3647 if (st->categories[i] >= 0)
+ − 3648 not_unlikely++;
+ − 3649
+ − 3650 retval = (not_unlikely <= 1
+ − 3651 #if 0 /* this is bogus */
+ − 3652 || st->bytes_seen >= MAX_BYTES_PROCESSED_FOR_DETECTION
428
+ − 3653 #endif
771
+ − 3654 );
+ − 3655
+ − 3656 #ifdef DEBUG_XEMACS
+ − 3657 if (!NILP (Vdebug_coding_detection))
+ − 3658 stderr_out ("detect_coding_type: returning %d (%s)\n",
+ − 3659 retval, retval ? "stop" : "keep going");
+ − 3660 #endif /* DEBUG_XEMACS */
+ − 3661
+ − 3662 return retval;
428
+ − 3663 }
+ − 3664 }
+ − 3665
+ − 3666 static Lisp_Object
771
+ − 3667 detected_coding_system (struct detection_state *st)
428
+ − 3668 {
771
+ − 3669 int i;
+ − 3670 int even = 1;
+ − 3671
+ − 3672 if (st->seen_non_ascii)
+ − 3673 {
+ − 3674 for (i = 0; i < coding_detector_category_count; i++)
+ − 3675 if (st->categories[i] != DET_AS_LIKELY_AS_UNLIKELY)
+ − 3676 {
+ − 3677 even = 0;
+ − 3678 break;
+ − 3679 }
+ − 3680 }
+ − 3681
+ − 3682 /* #### Here we are ignoring the results of detection when it's all
+ − 3683 ASCII. This is obviously a bad thing. But we need to fix up the
+ − 3684 existing detection methods somewhat before we can switch. */
+ − 3685 if (even)
428
+ − 3686 {
+ − 3687 /* If the file was entirely or basically ASCII, use the
+ − 3688 default value of `buffer-file-coding-system'. */
+ − 3689 Lisp_Object retval =
+ − 3690 XBUFFER (Vbuffer_defaults)->buffer_file_coding_system;
+ − 3691 if (!NILP (retval))
+ − 3692 {
771
+ − 3693 retval = find_coding_system_for_text_file (retval, 0);
428
+ − 3694 if (NILP (retval))
+ − 3695 {
+ − 3696 warn_when_safe
+ − 3697 (Qbad_variable, Qwarning,
+ − 3698 "Invalid `default-buffer-file-coding-system', set to nil");
+ − 3699 XBUFFER (Vbuffer_defaults)->buffer_file_coding_system = Qnil;
+ − 3700 }
+ − 3701 }
+ − 3702 if (NILP (retval))
+ − 3703 retval = Fget_coding_system (Qraw_text);
+ − 3704 return retval;
+ − 3705 }
+ − 3706 else
+ − 3707 {
771
+ − 3708 int likelihood;
+ − 3709 Lisp_Object retval = Qnil;
+ − 3710
+ − 3711 /* Look through the coding categories first by likelihood and then by
+ − 3712 priority and find the first one that is allowed. */
+ − 3713
+ − 3714 for (likelihood = DET_HIGHEST; likelihood >= DET_LOWEST; likelihood--)
428
+ − 3715 {
771
+ − 3716 for (i = 0; i < coding_detector_category_count; i++)
+ − 3717 {
+ − 3718 int cat = coding_category_by_priority[i];
+ − 3719 if (st->categories[cat] == likelihood &&
+ − 3720 !NILP (coding_category_system[cat]))
+ − 3721 {
+ − 3722 retval = (get_coding_system_for_text_file
+ − 3723 (coding_category_system[cat], 0));
+ − 3724 if (likelihood < DET_AS_LIKELY_AS_UNLIKELY)
+ − 3725 warn_when_safe_lispobj
+ − 3726 (intern ("detection"),
793
+ − 3727 Qwarning,
771
+ − 3728 emacs_sprintf_string_lisp
+ − 3729 (
+ − 3730 "Detected coding %s is unlikely to be correct (likelihood == `%s')",
+ − 3731 Qnil, 2, XCODING_SYSTEM_NAME (retval),
+ − 3732 detection_result_number_to_symbol
+ − 3733 ((enum detection_result) likelihood)));
+ − 3734 return retval;
+ − 3735 }
+ − 3736 }
428
+ − 3737 }
771
+ − 3738
+ − 3739 return Fget_coding_system (Qraw_text);
428
+ − 3740 }
+ − 3741 }
+ − 3742
+ − 3743 /* Given a seekable read stream and potential coding system and EOL type
+ − 3744 as specified, do any autodetection that is called for. If the
+ − 3745 coding system and/or EOL type are not `autodetect', they will be left
+ − 3746 alone; but this function will never return an autodetect coding system
+ − 3747 or EOL type.
+ − 3748
+ − 3749 This function does not automatically fetch subsidiary coding systems;
+ − 3750 that should be unnecessary with the explicit eol-type argument. */
+ − 3751
+ − 3752 #define LENGTH(string_constant) (sizeof (string_constant) - 1)
+ − 3753
771
+ − 3754 static Lisp_Object
+ − 3755 unwind_free_detection_state (Lisp_Object opaque)
+ − 3756 {
+ − 3757 struct detection_state *st =
+ − 3758 (struct detection_state *) get_opaque_ptr (opaque);
+ − 3759 free_detection_state (st);
+ − 3760 free_opaque_ptr (opaque);
+ − 3761 return Qnil;
+ − 3762 }
+ − 3763
+ − 3764 static Lisp_Object
+ − 3765 look_for_coding_system_magic_cookie (const UExtbyte *data, Bytecount len)
428
+ − 3766 {
771
+ − 3767 Lisp_Object coding_system = Qnil;
+ − 3768 const UExtbyte *p;
+ − 3769 const UExtbyte *scan_end;
+ − 3770
+ − 3771 /* Look for initial "-*-"; mode line prefix */
+ − 3772 for (p = data,
+ − 3773 scan_end = data + len - LENGTH ("-*-coding:?-*-");
+ − 3774 p <= scan_end
+ − 3775 && *p != '\n'
+ − 3776 && *p != '\r';
+ − 3777 p++)
+ − 3778 if (*p == '-' && *(p+1) == '*' && *(p+2) == '-')
+ − 3779 {
+ − 3780 const UExtbyte *local_vars_beg = p + 3;
+ − 3781 /* Look for final "-*-"; mode line suffix */
+ − 3782 for (p = local_vars_beg,
+ − 3783 scan_end = data + len - LENGTH ("-*-");
+ − 3784 p <= scan_end
428
+ − 3785 && *p != '\n'
+ − 3786 && *p != '\r';
771
+ − 3787 p++)
+ − 3788 if (*p == '-' && *(p+1) == '*' && *(p+2) == '-')
+ − 3789 {
+ − 3790 const UExtbyte *suffix = p;
+ − 3791 /* Look for "coding:" */
+ − 3792 for (p = local_vars_beg,
+ − 3793 scan_end = suffix - LENGTH ("coding:?");
+ − 3794 p <= scan_end;
+ − 3795 p++)
+ − 3796 if (memcmp ("coding:", p, LENGTH ("coding:")) == 0
+ − 3797 && (p == local_vars_beg
+ − 3798 || (*(p-1) == ' ' ||
+ − 3799 *(p-1) == '\t' ||
+ − 3800 *(p-1) == ';')))
+ − 3801 {
+ − 3802 Bytecount n;
867
+ − 3803 Ibyte *name;
771
+ − 3804
+ − 3805 p += LENGTH ("coding:");
+ − 3806 while (*p == ' ' || *p == '\t') p++;
867
+ − 3807 name = alloca_ibytes (suffix - p + 1);
771
+ − 3808 memcpy (name, p, suffix - p);
+ − 3809 name[suffix - p] = '\0';
+ − 3810
+ − 3811 /* Get coding system name */
+ − 3812 /* Characters valid in a MIME charset name (rfc 1521),
+ − 3813 and in a Lisp symbol name. */
+ − 3814 n = qxestrspn (name,
+ − 3815 "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+ − 3816 "abcdefghijklmnopqrstuvwxyz"
+ − 3817 "0123456789"
+ − 3818 "!$%&*+-.^_{|}~");
+ − 3819 if (n > 0)
428
+ − 3820 {
771
+ − 3821 name[n] = '\0';
+ − 3822 coding_system =
+ − 3823 find_coding_system_for_text_file (intern_int (name),
+ − 3824 0);
428
+ − 3825 }
771
+ − 3826 break;
+ − 3827 }
+ − 3828 break;
+ − 3829 }
+ − 3830 break;
+ − 3831 }
+ − 3832
+ − 3833 return coding_system;
+ − 3834 }
+ − 3835
+ − 3836 static Lisp_Object
+ − 3837 determine_real_coding_system (Lstream *stream)
+ − 3838 {
+ − 3839 struct detection_state *st = allocate_detection_state ();
+ − 3840 int depth = record_unwind_protect (unwind_free_detection_state,
+ − 3841 make_opaque_ptr (st));
+ − 3842 UExtbyte buf[4096];
+ − 3843 Bytecount nread = Lstream_read (stream, buf, sizeof (buf));
+ − 3844 Lisp_Object coding_system = look_for_coding_system_magic_cookie (buf, nread);
+ − 3845
+ − 3846 if (NILP (coding_system))
+ − 3847 {
+ − 3848 while (1)
+ − 3849 {
+ − 3850 if (detect_coding_type (st, buf, nread))
428
+ − 3851 break;
771
+ − 3852 nread = Lstream_read (stream, buf, sizeof (buf));
+ − 3853 if (nread == 0)
+ − 3854 break;
428
+ − 3855 }
771
+ − 3856
+ − 3857 coding_system = detected_coding_system (st);
428
+ − 3858 }
+ − 3859
+ − 3860 Lstream_rewind (stream);
771
+ − 3861
+ − 3862 unbind_to (depth);
+ − 3863 return coding_system;
+ − 3864 }
+ − 3865
+ − 3866 static void
+ − 3867 undecided_init_coding_stream (struct coding_stream *str)
+ − 3868 {
+ − 3869 struct undecided_coding_stream *data =
+ − 3870 CODING_STREAM_TYPE_DATA (str, undecided);
+ − 3871 struct undecided_coding_system *csdata =
+ − 3872 XCODING_SYSTEM_TYPE_DATA (str->codesys, undecided);
+ − 3873
+ − 3874 data->actual = Qnil;
+ − 3875
+ − 3876 if (str->direction == CODING_DECODE)
+ − 3877 {
+ − 3878 Lstream *lst = str->other_end;
+ − 3879
+ − 3880 if ((lst->flags & LSTREAM_FL_READ) &&
+ − 3881 Lstream_seekable_p (lst) &&
+ − 3882 csdata->do_coding)
+ − 3883 /* We can determine the coding system now. */
+ − 3884 data->actual = determine_real_coding_system (lst);
+ − 3885 }
+ − 3886 }
+ − 3887
+ − 3888 static void
+ − 3889 undecided_rewind_coding_stream (struct coding_stream *str)
+ − 3890 {
+ − 3891 chain_rewind_coding_stream_1 (&CODING_STREAM_TYPE_DATA (str, undecided)->c);
+ − 3892 }
+ − 3893
+ − 3894 static void
+ − 3895 undecided_finalize_coding_stream (struct coding_stream *str)
+ − 3896 {
+ − 3897 struct undecided_coding_stream *data =
+ − 3898 CODING_STREAM_TYPE_DATA (str, undecided);
+ − 3899
+ − 3900 chain_finalize_coding_stream_1
+ − 3901 (&CODING_STREAM_TYPE_DATA (str, undecided)->c);
+ − 3902 if (data->st)
+ − 3903 free_detection_state (data->st);
+ − 3904 }
+ − 3905
+ − 3906 static Lisp_Object
+ − 3907 undecided_canonicalize (Lisp_Object codesys)
+ − 3908 {
+ − 3909 struct undecided_coding_system *csdata =
+ − 3910 XCODING_SYSTEM_TYPE_DATA (codesys, undecided);
+ − 3911 if (!csdata->do_eol && !csdata->do_coding)
+ − 3912 return NILP (csdata->cs) ? Fget_coding_system (Qbinary) : csdata->cs;
+ − 3913 if (csdata->do_eol && !csdata->do_coding && NILP (csdata->cs))
+ − 3914 return Fget_coding_system (Qconvert_eol_autodetect);
+ − 3915 return codesys;
+ − 3916 }
+ − 3917
+ − 3918 static Bytecount
+ − 3919 undecided_convert (struct coding_stream *str, const UExtbyte *src,
+ − 3920 unsigned_char_dynarr *dst, Bytecount n)
+ − 3921 {
+ − 3922 int first_time = 0;
+ − 3923
+ − 3924 if (str->direction == CODING_DECODE)
+ − 3925 {
+ − 3926 /* At this point, we have only the following possibilities:
+ − 3927
+ − 3928 do_eol && do_coding
+ − 3929 do_coding only
+ − 3930 do_eol only and a coding system was specified
+ − 3931
+ − 3932 Other possibilities are removed during undecided_canonicalize.
+ − 3933
+ − 3934 Therefore, our substreams are either
+ − 3935
+ − 3936 lstream_coding -> lstream_dynarr, or
+ − 3937 lstream_coding -> lstream_eol -> lstream_dynarr.
+ − 3938 */
+ − 3939 struct undecided_coding_system *csdata =
+ − 3940 XCODING_SYSTEM_TYPE_DATA (str->codesys, undecided);
+ − 3941 struct undecided_coding_stream *data =
+ − 3942 CODING_STREAM_TYPE_DATA (str, undecided);
+ − 3943
+ − 3944 if (str->eof)
+ − 3945 {
+ − 3946 /* Each will close the next. We need to close now because more
+ − 3947 data may be generated. */
+ − 3948 if (data->c.initted)
+ − 3949 Lstream_close (XLSTREAM (data->c.lstreams[0]));
+ − 3950 return n;
+ − 3951 }
+ − 3952
+ − 3953 if (!data->c.initted)
+ − 3954 {
+ − 3955 data->c.lstream_count = csdata->do_eol ? 3 : 2;
+ − 3956 data->c.lstreams = xnew_array (Lisp_Object, data->c.lstream_count);
+ − 3957
+ − 3958 data->c.lstreams[data->c.lstream_count - 1] =
+ − 3959 make_dynarr_output_stream (dst);
+ − 3960 Lstream_set_buffering
+ − 3961 (XLSTREAM (data->c.lstreams[data->c.lstream_count - 1]),
+ − 3962 LSTREAM_UNBUFFERED, 0);
+ − 3963 if (csdata->do_eol)
+ − 3964 {
+ − 3965 data->c.lstreams[1] =
+ − 3966 make_coding_output_stream
+ − 3967 (XLSTREAM (data->c.lstreams[data->c.lstream_count - 1]),
+ − 3968 Fget_coding_system (Qconvert_eol_autodetect),
800
+ − 3969 CODING_DECODE, 0);
771
+ − 3970 Lstream_set_buffering
+ − 3971 (XLSTREAM (data->c.lstreams[1]),
+ − 3972 LSTREAM_UNBUFFERED, 0);
+ − 3973 }
+ − 3974
+ − 3975 data->c.lstreams[0] =
+ − 3976 make_coding_output_stream
+ − 3977 (XLSTREAM (data->c.lstreams[1]),
+ − 3978 /* Substitute binary if we need to detect the encoding */
+ − 3979 csdata->do_coding ? Qbinary : csdata->cs,
800
+ − 3980 CODING_DECODE, 0);
771
+ − 3981 Lstream_set_buffering (XLSTREAM (data->c.lstreams[0]),
+ − 3982 LSTREAM_UNBUFFERED, 0);
+ − 3983
+ − 3984 first_time = 1;
+ − 3985 data->c.initted = 1;
+ − 3986 }
+ − 3987
+ − 3988 /* If necessary, do encoding-detection now. We do this when we're a
+ − 3989 writing stream or a non-seekable reading stream, meaning that we
+ − 3990 can't just process the whole input, rewind, and start over. */
+ − 3991
+ − 3992 if (csdata->do_coding)
+ − 3993 {
+ − 3994 int actual_was_nil = NILP (data->actual);
+ − 3995 if (NILP (data->actual))
+ − 3996 {
+ − 3997 if (!data->st)
+ − 3998 data->st = allocate_detection_state ();
+ − 3999 if (first_time)
+ − 4000 /* #### This is cheesy. What we really ought to do is buffer
+ − 4001 up a certain minimum amount of data to get a better result.
+ − 4002 */
+ − 4003 data->actual = look_for_coding_system_magic_cookie (src, n);
+ − 4004 if (NILP (data->actual))
+ − 4005 {
+ − 4006 /* #### This is cheesy. What we really ought to do is buffer
+ − 4007 up a certain minimum amount of data so as to get a less
+ − 4008 random result when doing subprocess detection. */
+ − 4009 detect_coding_type (data->st, src, n);
+ − 4010 data->actual = detected_coding_system (data->st);
+ − 4011 }
+ − 4012 }
+ − 4013 /* We need to set the detected coding system if we actually have
+ − 4014 such a coding system but didn't before. That is the case
+ − 4015 either when we just detected it in the previous code or when
+ − 4016 it was detected during undecided_init_coding_stream(). We
+ − 4017 can check for that using first_time. */
+ − 4018 if (!NILP (data->actual) && (actual_was_nil || first_time))
+ − 4019 {
+ − 4020 /* If the detected coding system doesn't allow for EOL
+ − 4021 autodetection, try to get the equivalent that does;
+ − 4022 otherwise, disable EOL detection (overriding whatever
+ − 4023 may already have been detected). */
+ − 4024 if (XCODING_SYSTEM_EOL_TYPE (data->actual) != EOL_AUTODETECT)
+ − 4025 {
+ − 4026 if (!NILP (XCODING_SYSTEM_SUBSIDIARY_PARENT (data->actual)))
+ − 4027 data->actual =
+ − 4028 XCODING_SYSTEM_SUBSIDIARY_PARENT (data->actual);
+ − 4029 else if (data->c.lstream_count == 3)
+ − 4030 set_coding_stream_coding_system
+ − 4031 (XLSTREAM (data->c.lstreams[1]),
+ − 4032 Fget_coding_system (Qidentity));
+ − 4033 }
+ − 4034 set_coding_stream_coding_system
+ − 4035 (XLSTREAM (data->c.lstreams[0]), data->actual);
+ − 4036 }
+ − 4037 }
+ − 4038
+ − 4039 if (Lstream_write (XLSTREAM (data->c.lstreams[0]), src, n) < 0)
+ − 4040 return -1;
+ − 4041 return n;
+ − 4042 }
+ − 4043 else
+ − 4044 return no_conversion_convert (str, src, dst, n);
+ − 4045 }
+ − 4046
+ − 4047 static Lisp_Object
+ − 4048 undecided_canonicalize_after_coding (struct coding_stream *str)
+ − 4049 {
+ − 4050 struct undecided_coding_stream *data =
+ − 4051 CODING_STREAM_TYPE_DATA (str, undecided);
+ − 4052 Lisp_Object ret, eolret;
+ − 4053
+ − 4054 if (str->direction == CODING_ENCODE)
+ − 4055 return str->codesys;
+ − 4056
+ − 4057 if (!data->c.initted)
+ − 4058 return Fget_coding_system (Qundecided);
+ − 4059
+ − 4060 ret = coding_stream_canonicalize_after_coding
+ − 4061 (XLSTREAM (data->c.lstreams[0]));
+ − 4062 if (NILP (ret))
+ − 4063 ret = Fget_coding_system (Qundecided);
+ − 4064 if (XCODING_SYSTEM_EOL_TYPE (ret) != EOL_AUTODETECT)
+ − 4065 return ret;
+ − 4066 eolret = coding_stream_canonicalize_after_coding
+ − 4067 (XLSTREAM (data->c.lstreams[1]));
+ − 4068 if (!EQ (XCODING_SYSTEM_TYPE (eolret), Qconvert_eol))
+ − 4069 return ret;
+ − 4070 return
+ − 4071 Fsubsidiary_coding_system (ret, Fcoding_system_property (eolret,
+ − 4072 Qsubtype));
+ − 4073 }
+ − 4074
+ − 4075
+ − 4076 /************************************************************************/
+ − 4077 /* Lisp interface: Coding category functions and detection */
+ − 4078 /************************************************************************/
+ − 4079
+ − 4080 DEFUN ("coding-category-list", Fcoding_category_list, 0, 0, 0, /*
+ − 4081 Return a list of all recognized coding categories.
+ − 4082 */
+ − 4083 ())
+ − 4084 {
+ − 4085 int i;
+ − 4086 Lisp_Object list = Qnil;
+ − 4087
+ − 4088 for (i = 0; i < coding_detector_count; i++)
+ − 4089 {
+ − 4090 detector_category_dynarr *cats =
+ − 4091 Dynarr_at (all_coding_detectors, i).cats;
+ − 4092 int j;
+ − 4093
+ − 4094 for (j = 0; j < Dynarr_length (cats); j++)
+ − 4095 list = Fcons (Dynarr_at (cats, j).sym, list);
+ − 4096 }
+ − 4097
+ − 4098 return Fnreverse (list);
+ − 4099 }
+ − 4100
+ − 4101 DEFUN ("set-coding-priority-list", Fset_coding_priority_list, 1, 1, 0, /*
+ − 4102 Change the priority order of the coding categories.
+ − 4103 LIST should be list of coding categories, in descending order of
+ − 4104 priority. Unspecified coding categories will be lower in priority
+ − 4105 than all specified ones, in the same relative order they were in
+ − 4106 previously.
+ − 4107 */
+ − 4108 (list))
+ − 4109 {
+ − 4110 int *category_to_priority =
+ − 4111 alloca_array (int, coding_detector_category_count);
+ − 4112 int i, j;
+ − 4113 Lisp_Object rest;
+ − 4114
+ − 4115 /* First generate a list that maps coding categories to priorities. */
+ − 4116
+ − 4117 for (i = 0; i < coding_detector_category_count; i++)
+ − 4118 category_to_priority[i] = -1;
+ − 4119
+ − 4120 /* Highest priority comes from the specified list. */
+ − 4121 i = 0;
+ − 4122 EXTERNAL_LIST_LOOP (rest, list)
+ − 4123 {
+ − 4124 int cat = coding_category_symbol_to_id (XCAR (rest));
+ − 4125
+ − 4126 if (category_to_priority[cat] >= 0)
+ − 4127 sferror ("Duplicate coding category in list", XCAR (rest));
+ − 4128 category_to_priority[cat] = i++;
+ − 4129 }
+ − 4130
+ − 4131 /* Now go through the existing categories by priority to retrieve
+ − 4132 the categories not yet specified and preserve their priority
+ − 4133 order. */
+ − 4134 for (j = 0; j < coding_detector_category_count; j++)
+ − 4135 {
+ − 4136 int cat = coding_category_by_priority[j];
+ − 4137 if (category_to_priority[cat] < 0)
+ − 4138 category_to_priority[cat] = i++;
+ − 4139 }
+ − 4140
+ − 4141 /* Now we need to construct the inverse of the mapping we just
+ − 4142 constructed. */
+ − 4143
+ − 4144 for (i = 0; i < coding_detector_category_count; i++)
+ − 4145 coding_category_by_priority[category_to_priority[i]] = i;
+ − 4146
+ − 4147 /* Phew! That was confusing. */
+ − 4148 return Qnil;
+ − 4149 }
+ − 4150
+ − 4151 DEFUN ("coding-priority-list", Fcoding_priority_list, 0, 0, 0, /*
+ − 4152 Return a list of coding categories in descending order of priority.
+ − 4153 */
+ − 4154 ())
+ − 4155 {
+ − 4156 int i;
+ − 4157 Lisp_Object list = Qnil;
+ − 4158
+ − 4159 for (i = 0; i < coding_detector_category_count; i++)
+ − 4160 list =
+ − 4161 Fcons (coding_category_id_to_symbol (coding_category_by_priority[i]),
+ − 4162 list);
+ − 4163 return Fnreverse (list);
+ − 4164 }
+ − 4165
+ − 4166 DEFUN ("set-coding-category-system", Fset_coding_category_system, 2, 2, 0, /*
+ − 4167 Change the coding system associated with a coding category.
+ − 4168 */
+ − 4169 (coding_category, coding_system))
+ − 4170 {
+ − 4171 coding_category_system[coding_category_symbol_to_id (coding_category)] =
+ − 4172 Fget_coding_system (coding_system);
+ − 4173 return Qnil;
+ − 4174 }
+ − 4175
+ − 4176 DEFUN ("coding-category-system", Fcoding_category_system, 1, 1, 0, /*
+ − 4177 Return the coding system associated with a coding category.
+ − 4178 */
+ − 4179 (coding_category))
+ − 4180 {
+ − 4181 Lisp_Object sys =
+ − 4182 coding_category_system[coding_category_symbol_to_id (coding_category)];
+ − 4183
+ − 4184 if (!NILP (sys))
+ − 4185 return XCODING_SYSTEM_NAME (sys);
+ − 4186 return Qnil;
+ − 4187 }
+ − 4188
800
+ − 4189 /* Detect the encoding of STREAM. Assumes stream is at the begnning and will
+ − 4190 read through to the end of STREAM, leaving it there but open. */
+ − 4191
771
+ − 4192 Lisp_Object
+ − 4193 detect_coding_stream (Lisp_Object stream)
+ − 4194 {
+ − 4195 Lisp_Object val = Qnil;
+ − 4196 struct gcpro gcpro1, gcpro2, gcpro3;
+ − 4197 UExtbyte random_buffer[65536];
+ − 4198 Lisp_Object binary_instream =
+ − 4199 make_coding_input_stream
+ − 4200 (XLSTREAM (stream), Qbinary,
814
+ − 4201 CODING_ENCODE, LSTREAM_FL_NO_CLOSE_OTHER);
771
+ − 4202 Lisp_Object decstream =
+ − 4203 make_coding_input_stream
+ − 4204 (XLSTREAM (binary_instream),
800
+ − 4205 Qundecided, CODING_DECODE, 0);
771
+ − 4206 Lstream *decstr = XLSTREAM (decstream);
+ − 4207
+ − 4208 GCPRO3 (decstream, stream, binary_instream);
+ − 4209 /* Read and discard all data; detection happens as a side effect of this,
+ − 4210 and we examine what was detected afterwards. */
+ − 4211 while (Lstream_read (decstr, random_buffer, sizeof (random_buffer)) > 0)
+ − 4212 ;
+ − 4213
+ − 4214 val = coding_stream_detected_coding_system (decstr);
+ − 4215 Lstream_close (decstr);
+ − 4216 Lstream_delete (decstr);
+ − 4217 Lstream_delete (XLSTREAM (binary_instream));
+ − 4218 UNGCPRO;
+ − 4219 return val;
428
+ − 4220 }
+ − 4221
+ − 4222 DEFUN ("detect-coding-region", Fdetect_coding_region, 2, 3, 0, /*
+ − 4223 Detect coding system of the text in the region between START and END.
444
+ − 4224 Return a list of possible coding systems ordered by priority.
+ − 4225 If only ASCII characters are found, return 'undecided or one of
428
+ − 4226 its subsidiary coding systems according to a detected end-of-line
+ − 4227 type. Optional arg BUFFER defaults to the current buffer.
+ − 4228 */
+ − 4229 (start, end, buffer))
+ − 4230 {
+ − 4231 Lisp_Object val = Qnil;
+ − 4232 struct buffer *buf = decode_buffer (buffer, 0);
665
+ − 4233 Charbpos b, e;
771
+ − 4234 Lisp_Object lb_instream;
428
+ − 4235
+ − 4236 get_buffer_range_char (buf, start, end, &b, &e, 0);
+ − 4237 lb_instream = make_lisp_buffer_input_stream (buf, b, e, 0);
771
+ − 4238
+ − 4239 val = detect_coding_stream (lb_instream);
+ − 4240 Lstream_delete (XLSTREAM (lb_instream));
428
+ − 4241 return val;
+ − 4242 }
+ − 4243
+ − 4244
771
+ − 4245
+ − 4246 #ifdef DEBUG_XEMACS
+ − 4247
428
+ − 4248 /************************************************************************/
771
+ − 4249 /* Internal methods */
+ − 4250 /************************************************************************/
+ − 4251
+ − 4252 /* Raw (internally-formatted) data. */
+ − 4253 DEFINE_CODING_SYSTEM_TYPE (internal);
428
+ − 4254
665
+ − 4255 static Bytecount
771
+ − 4256 internal_convert (struct coding_stream *str, const UExtbyte *src,
+ − 4257 unsigned_char_dynarr *dst, Bytecount n)
+ − 4258 {
+ − 4259 Bytecount orign = n;
+ − 4260 Dynarr_add_many (dst, src, n);
+ − 4261 return orign;
+ − 4262 }
+ − 4263
+ − 4264 #endif /* DEBUG_XEMACS */
+ − 4265
+ − 4266
+ − 4267
+ − 4268 #ifdef HAVE_ZLIB
+ − 4269
+ − 4270 /************************************************************************/
+ − 4271 /* Gzip methods */
+ − 4272 /************************************************************************/
+ − 4273
+ − 4274 DEFINE_CODING_SYSTEM_TYPE (gzip);
+ − 4275
+ − 4276 struct gzip_coding_system
428
+ − 4277 {
771
+ − 4278 int level; /* 0 through 9, or -1 for default */
+ − 4279 };
+ − 4280
+ − 4281 #define CODING_SYSTEM_GZIP_LEVEL(codesys) \
+ − 4282 (CODING_SYSTEM_TYPE_DATA (codesys, gzip)->level)
+ − 4283 #define XCODING_SYSTEM_GZIP_LEVEL(codesys) \
+ − 4284 (XCODING_SYSTEM_TYPE_DATA (codesys, gzip)->level)
+ − 4285
+ − 4286 struct gzip_coding_stream
428
+ − 4287 {
771
+ − 4288 z_stream stream;
+ − 4289 int stream_initted;
+ − 4290 int reached_eof; /* #### this should be handled by the caller, once we
+ − 4291 return LSTREAM_EOF */
+ − 4292 };
+ − 4293
+ − 4294 static const struct lrecord_description
+ − 4295 gzip_coding_system_description[] = {
+ − 4296 { XD_END }
+ − 4297 };
+ − 4298
+ − 4299 enum source_sink_type
+ − 4300 gzip_conversion_end_type (Lisp_Object codesys)
+ − 4301 {
+ − 4302 return DECODES_BYTE_TO_BYTE;
428
+ − 4303 }
+ − 4304
+ − 4305 static void
771
+ − 4306 gzip_init (Lisp_Object codesys)
+ − 4307 {
+ − 4308 struct gzip_coding_system *data = XCODING_SYSTEM_TYPE_DATA (codesys, gzip);
+ − 4309 data->level = -1;
+ − 4310 }
+ − 4311
+ − 4312 static void
+ − 4313 gzip_print (Lisp_Object cs, Lisp_Object printcharfun, int escapeflag)
428
+ − 4314 {
771
+ − 4315 struct gzip_coding_system *data = XCODING_SYSTEM_TYPE_DATA (cs, gzip);
+ − 4316
826
+ − 4317 write_c_string (printcharfun, "(");
771
+ − 4318 if (data->level == -1)
826
+ − 4319 write_c_string (printcharfun, "default");
771
+ − 4320 else
+ − 4321 print_internal (make_int (data->level), printcharfun, 0);
826
+ − 4322 write_c_string (printcharfun, ")");
428
+ − 4323 }
+ − 4324
+ − 4325 static int
771
+ − 4326 gzip_putprop (Lisp_Object codesys, Lisp_Object key, Lisp_Object value)
428
+ − 4327 {
771
+ − 4328 struct gzip_coding_system *data = XCODING_SYSTEM_TYPE_DATA (codesys, gzip);
+ − 4329
+ − 4330 if (EQ (key, Qlevel))
428
+ − 4331 {
771
+ − 4332 if (EQ (value, Qdefault))
+ − 4333 data->level = -1;
+ − 4334 else
428
+ − 4335 {
771
+ − 4336 CHECK_INT (value);
+ − 4337 check_int_range (XINT (value), 0, 9);
+ − 4338 data->level = XINT (value);
428
+ − 4339 }
+ − 4340 }
+ − 4341 else
771
+ − 4342 return 0;
+ − 4343 return 1;
428
+ − 4344 }
+ − 4345
+ − 4346 static Lisp_Object
771
+ − 4347 gzip_getprop (Lisp_Object coding_system, Lisp_Object prop)
428
+ − 4348 {
771
+ − 4349 struct gzip_coding_system *data =
+ − 4350 XCODING_SYSTEM_TYPE_DATA (coding_system, gzip);
+ − 4351
+ − 4352 if (EQ (prop, Qlevel))
428
+ − 4353 {
771
+ − 4354 if (data->level == -1)
+ − 4355 return Qdefault;
+ − 4356 return make_int (data->level);
428
+ − 4357 }
771
+ − 4358
+ − 4359 return Qunbound;
428
+ − 4360 }
+ − 4361
+ − 4362 static void
771
+ − 4363 gzip_init_coding_stream (struct coding_stream *str)
428
+ − 4364 {
771
+ − 4365 struct gzip_coding_stream *data = CODING_STREAM_TYPE_DATA (str, gzip);
+ − 4366 if (data->stream_initted)
428
+ − 4367 {
771
+ − 4368 if (str->direction == CODING_DECODE)
+ − 4369 inflateEnd (&data->stream);
+ − 4370 else
+ − 4371 deflateEnd (&data->stream);
+ − 4372 data->stream_initted = 0;
428
+ − 4373 }
771
+ − 4374 data->reached_eof = 0;
428
+ − 4375 }
+ − 4376
+ − 4377 static void
771
+ − 4378 gzip_rewind_coding_stream (struct coding_stream *str)
428
+ − 4379 {
771
+ − 4380 gzip_init_coding_stream (str);
428
+ − 4381 }
+ − 4382
771
+ − 4383 static Bytecount
+ − 4384 gzip_convert (struct coding_stream *str,
+ − 4385 const UExtbyte *src,
+ − 4386 unsigned_char_dynarr *dst, Bytecount n)
428
+ − 4387 {
771
+ − 4388 struct gzip_coding_stream *data = CODING_STREAM_TYPE_DATA (str, gzip);
+ − 4389 int zerr;
+ − 4390 if (str->direction == CODING_DECODE)
428
+ − 4391 {
771
+ − 4392 if (data->reached_eof)
+ − 4393 return n; /* eat the data */
+ − 4394
+ − 4395 if (!data->stream_initted)
428
+ − 4396 {
771
+ − 4397 xzero (data->stream);
+ − 4398 if (inflateInit (&data->stream) != Z_OK)
+ − 4399 return LSTREAM_ERROR;
+ − 4400 data->stream_initted = 1;
428
+ − 4401 }
771
+ − 4402
+ − 4403 data->stream.next_in = (Bytef *) src;
+ − 4404 data->stream.avail_in = n;
+ − 4405
+ − 4406 /* Normally we stop when we've fed all data to the decompressor; but
+ − 4407 if we're at the end of the input, and the decompressor hasn't
+ − 4408 reported EOF, we need to keep going, as there might be more output
+ − 4409 to generate. Z_OK from the decompressor means input was processed
+ − 4410 or output was generated; if neither, we break out of the loop.
+ − 4411 Other return values are:
+ − 4412
+ − 4413 Z_STREAM_END EOF from decompressor
+ − 4414 Z_DATA_ERROR Corrupted data
+ − 4415 Z_BUF_ERROR No progress possible (this should happen if
+ − 4416 we try to feed it an incomplete file)
+ − 4417 Z_MEM_ERROR Out of memory
+ − 4418 Z_STREAM_ERROR (should never happen)
+ − 4419 Z_NEED_DICT (#### when will this happen?)
+ − 4420 */
+ − 4421 while (data->stream.avail_in > 0 || str->eof)
+ − 4422 {
+ − 4423 /* Reserve an output buffer of the same size as the input buffer;
+ − 4424 if that's not enough, we keep reserving the same size. */
+ − 4425 Bytecount reserved = n;
+ − 4426 Dynarr_add_many (dst, 0, reserved);
+ − 4427 /* Careful here! Don't retrieve the pointer until after
+ − 4428 reserving the space, or it might be bogus */
+ − 4429 data->stream.next_out =
+ − 4430 Dynarr_atp (dst, Dynarr_length (dst) - reserved);
+ − 4431 data->stream.avail_out = reserved;
+ − 4432 zerr = inflate (&data->stream, Z_NO_FLUSH);
+ − 4433 /* Lop off the unused portion */
+ − 4434 Dynarr_set_size (dst, Dynarr_length (dst) - data->stream.avail_out);
+ − 4435 if (zerr != Z_OK)
+ − 4436 break;
+ − 4437 }
+ − 4438
+ − 4439 if (zerr == Z_STREAM_END)
+ − 4440 data->reached_eof = 1;
+ − 4441
+ − 4442 if ((Bytecount) data->stream.avail_in < n)
+ − 4443 return n - data->stream.avail_in;
+ − 4444
+ − 4445 if (zerr == Z_OK || zerr == Z_STREAM_END)
+ − 4446 return 0;
+ − 4447
+ − 4448 return LSTREAM_ERROR;
428
+ − 4449 }
+ − 4450 else
+ − 4451 {
771
+ − 4452 if (!data->stream_initted)
+ − 4453 {
+ − 4454 int level = XCODING_SYSTEM_GZIP_LEVEL (str->codesys);
+ − 4455 xzero (data->stream);
+ − 4456 if (deflateInit (&data->stream,
+ − 4457 level == -1 ? Z_DEFAULT_COMPRESSION : level) !=
+ − 4458 Z_OK)
+ − 4459 return LSTREAM_ERROR;
+ − 4460 data->stream_initted = 1;
428
+ − 4461 }
771
+ − 4462
+ − 4463 data->stream.next_in = (Bytef *) src;
+ − 4464 data->stream.avail_in = n;
+ − 4465
+ − 4466 /* Normally we stop when we've fed all data to the compressor; but if
+ − 4467 we're at the end of the input, and the compressor hasn't reported
+ − 4468 EOF, we need to keep going, as there might be more output to
+ − 4469 generate. (To signal EOF on our end, we set the FLUSH parameter
+ − 4470 to Z_FINISH; when all data is output, Z_STREAM_END will be
+ − 4471 returned.) Z_OK from the compressor means input was processed or
+ − 4472 output was generated; if neither, we break out of the loop. Other
+ − 4473 return values are:
+ − 4474
+ − 4475 Z_STREAM_END EOF from compressor
+ − 4476 Z_BUF_ERROR No progress possible (should never happen)
+ − 4477 Z_STREAM_ERROR (should never happen)
+ − 4478 */
+ − 4479 while (data->stream.avail_in > 0 || str->eof)
+ − 4480 {
+ − 4481 /* Reserve an output buffer of the same size as the input buffer;
+ − 4482 if that's not enough, we keep reserving the same size. */
+ − 4483 Bytecount reserved = n;
+ − 4484 Dynarr_add_many (dst, 0, reserved);
+ − 4485 /* Careful here! Don't retrieve the pointer until after
+ − 4486 reserving the space, or it might be bogus */
+ − 4487 data->stream.next_out =
+ − 4488 Dynarr_atp (dst, Dynarr_length (dst) - reserved);
+ − 4489 data->stream.avail_out = reserved;
+ − 4490 zerr =
+ − 4491 deflate (&data->stream,
+ − 4492 str->eof ? Z_FINISH : Z_NO_FLUSH);
+ − 4493 /* Lop off the unused portion */
+ − 4494 Dynarr_set_size (dst, Dynarr_length (dst) - data->stream.avail_out);
+ − 4495 if (zerr != Z_OK)
+ − 4496 break;
+ − 4497 }
+ − 4498
+ − 4499 if ((Bytecount) data->stream.avail_in < n)
+ − 4500 return n - data->stream.avail_in;
+ − 4501
+ − 4502 if (zerr == Z_OK || zerr == Z_STREAM_END)
+ − 4503 return 0;
+ − 4504
+ − 4505 return LSTREAM_ERROR;
428
+ − 4506 }
+ − 4507 }
+ − 4508
771
+ − 4509 #endif /* HAVE_ZLIB */
428
+ − 4510
+ − 4511
+ − 4512 /************************************************************************/
+ − 4513 /* Initialization */
+ − 4514 /************************************************************************/
+ − 4515
+ − 4516 void
+ − 4517 syms_of_file_coding (void)
+ − 4518 {
442
+ − 4519 INIT_LRECORD_IMPLEMENTATION (coding_system);
+ − 4520
771
+ − 4521 DEFSUBR (Fvalid_coding_system_type_p);
+ − 4522 DEFSUBR (Fcoding_system_type_list);
428
+ − 4523 DEFSUBR (Fcoding_system_p);
+ − 4524 DEFSUBR (Ffind_coding_system);
+ − 4525 DEFSUBR (Fget_coding_system);
+ − 4526 DEFSUBR (Fcoding_system_list);
+ − 4527 DEFSUBR (Fcoding_system_name);
+ − 4528 DEFSUBR (Fmake_coding_system);
+ − 4529 DEFSUBR (Fcopy_coding_system);
440
+ − 4530 DEFSUBR (Fcoding_system_canonical_name_p);
+ − 4531 DEFSUBR (Fcoding_system_alias_p);
+ − 4532 DEFSUBR (Fcoding_system_aliasee);
428
+ − 4533 DEFSUBR (Fdefine_coding_system_alias);
+ − 4534 DEFSUBR (Fsubsidiary_coding_system);
771
+ − 4535 DEFSUBR (Fcoding_system_base);
+ − 4536 DEFSUBR (Fcoding_system_used_for_io);
428
+ − 4537
+ − 4538 DEFSUBR (Fcoding_system_type);
771
+ − 4539 DEFSUBR (Fcoding_system_description);
428
+ − 4540 DEFSUBR (Fcoding_system_property);
+ − 4541
+ − 4542 DEFSUBR (Fcoding_category_list);
+ − 4543 DEFSUBR (Fset_coding_priority_list);
+ − 4544 DEFSUBR (Fcoding_priority_list);
+ − 4545 DEFSUBR (Fset_coding_category_system);
+ − 4546 DEFSUBR (Fcoding_category_system);
+ − 4547
+ − 4548 DEFSUBR (Fdetect_coding_region);
+ − 4549 DEFSUBR (Fdecode_coding_region);
+ − 4550 DEFSUBR (Fencode_coding_region);
563
+ − 4551 DEFSYMBOL_MULTIWORD_PREDICATE (Qcoding_systemp);
+ − 4552 DEFSYMBOL (Qno_conversion);
771
+ − 4553 DEFSYMBOL (Qconvert_eol);
+ − 4554 DEFSYMBOL (Qconvert_eol_autodetect);
+ − 4555 DEFSYMBOL (Qconvert_eol_lf);
+ − 4556 DEFSYMBOL (Qconvert_eol_cr);
+ − 4557 DEFSYMBOL (Qconvert_eol_crlf);
563
+ − 4558 DEFSYMBOL (Qraw_text);
771
+ − 4559
563
+ − 4560 DEFSYMBOL (Qmnemonic);
+ − 4561 DEFSYMBOL (Qeol_type);
+ − 4562 DEFSYMBOL (Qpost_read_conversion);
+ − 4563 DEFSYMBOL (Qpre_write_conversion);
+ − 4564
771
+ − 4565 DEFSYMBOL (Qtranslation_table_for_decode);
+ − 4566 DEFSYMBOL (Qtranslation_table_for_encode);
+ − 4567 DEFSYMBOL (Qsafe_chars);
+ − 4568 DEFSYMBOL (Qsafe_charsets);
+ − 4569 DEFSYMBOL (Qmime_charset);
+ − 4570 DEFSYMBOL (Qvalid_codes);
+ − 4571
563
+ − 4572 DEFSYMBOL (Qcr);
+ − 4573 DEFSYMBOL (Qlf);
+ − 4574 DEFSYMBOL (Qcrlf);
+ − 4575 DEFSYMBOL (Qeol_cr);
+ − 4576 DEFSYMBOL (Qeol_lf);
+ − 4577 DEFSYMBOL (Qeol_crlf);
+ − 4578 DEFSYMBOL (Qencode);
+ − 4579 DEFSYMBOL (Qdecode);
428
+ − 4580
771
+ − 4581 DEFSYMBOL (Qnear_certainty);
+ − 4582 DEFSYMBOL (Qquite_probable);
+ − 4583 DEFSYMBOL (Qsomewhat_likely);
+ − 4584 DEFSYMBOL (Qas_likely_as_unlikely);
+ − 4585 DEFSYMBOL (Qsomewhat_unlikely);
+ − 4586 DEFSYMBOL (Qquite_improbable);
+ − 4587 DEFSYMBOL (Qnearly_impossible);
+ − 4588
+ − 4589 DEFSYMBOL (Qdo_eol);
+ − 4590 DEFSYMBOL (Qdo_coding);
+ − 4591
+ − 4592 DEFSYMBOL (Qcanonicalize_after_coding);
+ − 4593
+ − 4594 DEFSYMBOL (Qescape_quoted);
+ − 4595
+ − 4596 #ifdef HAVE_ZLIB
+ − 4597 DEFSYMBOL (Qgzip);
+ − 4598 #endif
+ − 4599
+ − 4600 /* WARNING: The existing categories are intimately tied to the function
+ − 4601 `coding-system-category' in coding.el. If you change a category, or
+ − 4602 change the layout of any coding system associated with a category, you
+ − 4603 need to check that function and make sure it's written properly. */
+ − 4604
+ − 4605 #ifdef HAVE_DEFAULT_EOL_DETECTION
+ − 4606 Fprovide (intern ("unix-default-eol-detection"));
+ − 4607 #endif
428
+ − 4608 }
+ − 4609
+ − 4610 void
+ − 4611 lstream_type_create_file_coding (void)
+ − 4612 {
771
+ − 4613 LSTREAM_HAS_METHOD (coding, reader);
+ − 4614 LSTREAM_HAS_METHOD (coding, writer);
+ − 4615 LSTREAM_HAS_METHOD (coding, rewinder);
+ − 4616 LSTREAM_HAS_METHOD (coding, seekable_p);
+ − 4617 LSTREAM_HAS_METHOD (coding, marker);
+ − 4618 LSTREAM_HAS_METHOD (coding, flusher);
+ − 4619 LSTREAM_HAS_METHOD (coding, closer);
+ − 4620 LSTREAM_HAS_METHOD (coding, finalizer);
+ − 4621 }
+ − 4622
+ − 4623 void
+ − 4624 coding_system_type_create (void)
+ − 4625 {
+ − 4626 int i;
+ − 4627
+ − 4628 staticpro (&Vcoding_system_hash_table);
+ − 4629 Vcoding_system_hash_table =
+ − 4630 make_lisp_hash_table (50, HASH_TABLE_NON_WEAK, HASH_TABLE_EQ);
+ − 4631
+ − 4632 the_coding_system_type_entry_dynarr = Dynarr_new (coding_system_type_entry);
+ − 4633 dump_add_root_struct_ptr (&the_coding_system_type_entry_dynarr,
+ − 4634 &csted_description);
+ − 4635
+ − 4636 Vcoding_system_type_list = Qnil;
+ − 4637 staticpro (&Vcoding_system_type_list);
+ − 4638
+ − 4639 /* Initialize to something reasonable ... */
+ − 4640 for (i = 0; i < MAX_DETECTOR_CATEGORIES; i++)
+ − 4641 {
+ − 4642 coding_category_system[i] = Qnil;
+ − 4643 dump_add_root_object (&coding_category_system[i]);
+ − 4644 coding_category_by_priority[i] = i;
+ − 4645 }
+ − 4646
+ − 4647 dump_add_opaque (coding_category_by_priority,
+ − 4648 sizeof (coding_category_by_priority));
+ − 4649
+ − 4650 all_coding_detectors = Dynarr_new2 (detector_dynarr, struct detector);
+ − 4651 dump_add_root_struct_ptr (&all_coding_detectors,
+ − 4652 &detector_dynarr_description);
+ − 4653
+ − 4654 dump_add_opaque_int (&coding_system_tick);
+ − 4655 dump_add_opaque_int (&coding_detector_count);
+ − 4656 dump_add_opaque_int (&coding_detector_category_count);
+ − 4657
+ − 4658 INITIALIZE_CODING_SYSTEM_TYPE (no_conversion,
+ − 4659 "no-conversion-coding-system-p");
+ − 4660 CODING_SYSTEM_HAS_METHOD (no_conversion, convert);
+ − 4661
+ − 4662 INITIALIZE_DETECTOR (no_conversion);
+ − 4663 DETECTOR_HAS_METHOD (no_conversion, detect);
+ − 4664 INITIALIZE_DETECTOR_CATEGORY (no_conversion, no_conversion);
+ − 4665
+ − 4666 INITIALIZE_CODING_SYSTEM_TYPE_WITH_DATA (convert_eol,
+ − 4667 "convert-eol-coding-system-p");
+ − 4668 CODING_SYSTEM_HAS_METHOD (convert_eol, print);
+ − 4669 CODING_SYSTEM_HAS_METHOD (convert_eol, convert);
+ − 4670 CODING_SYSTEM_HAS_METHOD (convert_eol, getprop);
+ − 4671 CODING_SYSTEM_HAS_METHOD (convert_eol, putprop);
+ − 4672 CODING_SYSTEM_HAS_METHOD (convert_eol, conversion_end_type);
+ − 4673 CODING_SYSTEM_HAS_METHOD (convert_eol, canonicalize_after_coding);
+ − 4674 CODING_SYSTEM_HAS_METHOD (convert_eol, init_coding_stream);
+ − 4675
+ − 4676 INITIALIZE_CODING_SYSTEM_TYPE_WITH_DATA (undecided,
+ − 4677 "undecided-coding-system-p");
+ − 4678 CODING_SYSTEM_HAS_METHOD (undecided, init);
+ − 4679 CODING_SYSTEM_HAS_METHOD (undecided, mark);
+ − 4680 CODING_SYSTEM_HAS_METHOD (undecided, print);
+ − 4681 CODING_SYSTEM_HAS_METHOD (undecided, convert);
+ − 4682 CODING_SYSTEM_HAS_METHOD (undecided, putprop);
+ − 4683 CODING_SYSTEM_HAS_METHOD (undecided, getprop);
+ − 4684 CODING_SYSTEM_HAS_METHOD (undecided, init_coding_stream);
+ − 4685 CODING_SYSTEM_HAS_METHOD (undecided, rewind_coding_stream);
+ − 4686 CODING_SYSTEM_HAS_METHOD (undecided, finalize_coding_stream);
+ − 4687 CODING_SYSTEM_HAS_METHOD (undecided, mark_coding_stream);
+ − 4688 CODING_SYSTEM_HAS_METHOD (undecided, canonicalize);
+ − 4689 CODING_SYSTEM_HAS_METHOD (undecided, canonicalize_after_coding);
+ − 4690
+ − 4691 INITIALIZE_CODING_SYSTEM_TYPE_WITH_DATA (chain, "chain-coding-system-p");
+ − 4692
+ − 4693 CODING_SYSTEM_HAS_METHOD (chain, print);
+ − 4694 CODING_SYSTEM_HAS_METHOD (chain, canonicalize);
+ − 4695 CODING_SYSTEM_HAS_METHOD (chain, init);
+ − 4696 CODING_SYSTEM_HAS_METHOD (chain, mark);
+ − 4697 CODING_SYSTEM_HAS_METHOD (chain, mark_coding_stream);
+ − 4698 CODING_SYSTEM_HAS_METHOD (chain, convert);
+ − 4699 CODING_SYSTEM_HAS_METHOD (chain, rewind_coding_stream);
+ − 4700 CODING_SYSTEM_HAS_METHOD (chain, finalize_coding_stream);
+ − 4701 CODING_SYSTEM_HAS_METHOD (chain, finalize);
+ − 4702 CODING_SYSTEM_HAS_METHOD (chain, putprop);
+ − 4703 CODING_SYSTEM_HAS_METHOD (chain, getprop);
+ − 4704 CODING_SYSTEM_HAS_METHOD (chain, conversion_end_type);
+ − 4705 CODING_SYSTEM_HAS_METHOD (chain, canonicalize_after_coding);
+ − 4706
+ − 4707 #ifdef DEBUG_XEMACS
+ − 4708 INITIALIZE_CODING_SYSTEM_TYPE (internal, "internal-coding-system-p");
+ − 4709 CODING_SYSTEM_HAS_METHOD (internal, convert);
+ − 4710 #endif
+ − 4711
+ − 4712 #ifdef HAVE_ZLIB
+ − 4713 INITIALIZE_CODING_SYSTEM_TYPE_WITH_DATA (gzip, "gzip-coding-system-p");
+ − 4714 CODING_SYSTEM_HAS_METHOD (gzip, conversion_end_type);
+ − 4715 CODING_SYSTEM_HAS_METHOD (gzip, convert);
+ − 4716 CODING_SYSTEM_HAS_METHOD (gzip, init);
+ − 4717 CODING_SYSTEM_HAS_METHOD (gzip, print);
+ − 4718 CODING_SYSTEM_HAS_METHOD (gzip, init_coding_stream);
+ − 4719 CODING_SYSTEM_HAS_METHOD (gzip, rewind_coding_stream);
+ − 4720 CODING_SYSTEM_HAS_METHOD (gzip, putprop);
+ − 4721 CODING_SYSTEM_HAS_METHOD (gzip, getprop);
+ − 4722 #endif
+ − 4723 }
+ − 4724
+ − 4725 void
+ − 4726 reinit_coding_system_type_create (void)
+ − 4727 {
+ − 4728 REINITIALIZE_CODING_SYSTEM_TYPE (no_conversion);
+ − 4729 REINITIALIZE_CODING_SYSTEM_TYPE (convert_eol);
+ − 4730 REINITIALIZE_CODING_SYSTEM_TYPE (undecided);
+ − 4731 REINITIALIZE_CODING_SYSTEM_TYPE (chain);
+ − 4732 #if 0
+ − 4733 REINITIALIZE_CODING_SYSTEM_TYPE (text_file_wrapper);
+ − 4734 #endif /* 0 */
+ − 4735 #ifdef DEBUG_XEMACS
+ − 4736 REINITIALIZE_CODING_SYSTEM_TYPE (internal);
+ − 4737 #endif
+ − 4738 #ifdef HAVE_ZLIB
+ − 4739 REINITIALIZE_CODING_SYSTEM_TYPE (gzip);
+ − 4740 #endif
+ − 4741 }
+ − 4742
+ − 4743 void
+ − 4744 reinit_vars_of_file_coding (void)
+ − 4745 {
428
+ − 4746 }
+ − 4747
+ − 4748 void
+ − 4749 vars_of_file_coding (void)
+ − 4750 {
771
+ − 4751 reinit_vars_of_file_coding ();
+ − 4752
+ − 4753 /* We always have file-coding support */
428
+ − 4754 Fprovide (intern ("file-coding"));
+ − 4755
+ − 4756 DEFVAR_LISP ("keyboard-coding-system", &Vkeyboard_coding_system /*
+ − 4757 Coding system used for TTY keyboard input.
+ − 4758 Not used under a windowing system.
+ − 4759 */ );
+ − 4760 Vkeyboard_coding_system = Qnil;
+ − 4761
+ − 4762 DEFVAR_LISP ("terminal-coding-system", &Vterminal_coding_system /*
+ − 4763 Coding system used for TTY display output.
+ − 4764 Not used under a windowing system.
+ − 4765 */ );
+ − 4766 Vterminal_coding_system = Qnil;
+ − 4767
+ − 4768 DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read /*
440
+ − 4769 Overriding coding system used when reading from a file or process.
+ − 4770 You should bind this variable with `let', but do not set it globally.
+ − 4771 If this is non-nil, it specifies the coding system that will be used
+ − 4772 to decode input on read operations, such as from a file or process.
+ − 4773 It overrides `buffer-file-coding-system-for-read',
428
+ − 4774 `insert-file-contents-pre-hook', etc. Use those variables instead of
440
+ − 4775 this one for permanent changes to the environment. */ );
428
+ − 4776 Vcoding_system_for_read = Qnil;
+ − 4777
+ − 4778 DEFVAR_LISP ("coding-system-for-write",
+ − 4779 &Vcoding_system_for_write /*
440
+ − 4780 Overriding coding system used when writing to a file or process.
+ − 4781 You should bind this variable with `let', but do not set it globally.
+ − 4782 If this is non-nil, it specifies the coding system that will be used
+ − 4783 to encode output for write operations, such as to a file or process.
+ − 4784 It overrides `buffer-file-coding-system', `write-region-pre-hook', etc.
+ − 4785 Use those variables instead of this one for permanent changes to the
+ − 4786 environment. */ );
428
+ − 4787 Vcoding_system_for_write = Qnil;
+ − 4788
+ − 4789 DEFVAR_LISP ("file-name-coding-system", &Vfile_name_coding_system /*
+ − 4790 Coding system used to convert pathnames when accessing files.
+ − 4791 */ );
+ − 4792 Vfile_name_coding_system = Qnil;
+ − 4793
+ − 4794 DEFVAR_BOOL ("enable-multibyte-characters", &enable_multibyte_characters /*
771
+ − 4795 Setting this has no effect. It is purely for FSF compatibility.
428
+ − 4796 */ );
+ − 4797 enable_multibyte_characters = 1;
771
+ − 4798
+ − 4799 Vchain_canonicalize_hash_table =
+ − 4800 make_lisp_hash_table (50, HASH_TABLE_NON_WEAK, HASH_TABLE_EQUAL);
+ − 4801 staticpro (&Vchain_canonicalize_hash_table);
+ − 4802
+ − 4803 #ifdef DEBUG_XEMACS
+ − 4804 DEFVAR_LISP ("debug-coding-detection", &Vdebug_coding_detection /*
+ − 4805 If non-nil, display debug information about detection operations in progress.
+ − 4806 Information is displayed on stderr.
+ − 4807 */ );
+ − 4808 Vdebug_coding_detection = Qnil;
+ − 4809 #endif
428
+ − 4810 }
+ − 4811
+ − 4812 void
+ − 4813 complex_vars_of_file_coding (void)
+ − 4814 {
771
+ − 4815 Fmake_coding_system
+ − 4816 (Qconvert_eol_cr, Qconvert_eol,
+ − 4817 build_msg_string ("Convert CR to LF"),
+ − 4818 nconc2 (list6 (Qdocumentation,
+ − 4819 build_msg_string (
+ − 4820 "Converts CR (used to mark the end of a line on Macintosh systems) to LF\n"
+ − 4821 "(used internally and under Unix to mark the end of a line)."),
+ − 4822 Qmnemonic, build_string ("CR->LF"),
+ − 4823 Qsubtype, Qcr),
+ − 4824 /* VERY IMPORTANT! Tell make-coding-system not to generate
+ − 4825 subsidiaries -- it needs the coding systems we're creating
+ − 4826 to do so! */
+ − 4827 list2 (Qeol_type, Qlf)));
+ − 4828
+ − 4829 Fmake_coding_system
+ − 4830 (Qconvert_eol_lf, Qconvert_eol,
+ − 4831 build_msg_string ("Convert LF to LF (do nothing)"),
+ − 4832 nconc2 (list6 (Qdocumentation,
+ − 4833 build_msg_string (
+ − 4834 "Do nothing."),
+ − 4835 Qmnemonic, build_string ("LF->LF"),
+ − 4836 Qsubtype, Qlf),
+ − 4837 /* VERY IMPORTANT! Tell make-coding-system not to generate
+ − 4838 subsidiaries -- it needs the coding systems we're creating
+ − 4839 to do so! */
+ − 4840 list2 (Qeol_type, Qlf)));
+ − 4841
+ − 4842 Fmake_coding_system
+ − 4843 (Qconvert_eol_crlf, Qconvert_eol,
+ − 4844 build_msg_string ("Convert CRLF to LF"),
+ − 4845 nconc2 (list6 (Qdocumentation,
+ − 4846 build_msg_string (
+ − 4847 "Converts CR+LF (used to mark the end of a line on Macintosh systems) to LF\n"
+ − 4848 "(used internally and under Unix to mark the end of a line)."),
+ − 4849 Qmnemonic, build_string ("CRLF->LF"),
+ − 4850 Qsubtype, Qcrlf),
+ − 4851 /* VERY IMPORTANT! Tell make-coding-system not to generate
+ − 4852 subsidiaries -- it needs the coding systems we're creating
+ − 4853 to do so! */
+ − 4854 list2 (Qeol_type, Qlf)));
+ − 4855
+ − 4856 Fmake_coding_system
+ − 4857 (Qconvert_eol_autodetect, Qconvert_eol,
+ − 4858 build_msg_string ("Autodetect EOL type"),
+ − 4859 nconc2 (list6 (Qdocumentation,
+ − 4860 build_msg_string (
+ − 4861 "Autodetect the end-of-line type."),
+ − 4862 Qmnemonic, build_string ("Auto-EOL"),
793
+ − 4863 Qsubtype, Qnil),
771
+ − 4864 /* VERY IMPORTANT! Tell make-coding-system not to generate
+ − 4865 subsidiaries -- it needs the coding systems we're creating
+ − 4866 to do so! */
+ − 4867 list2 (Qeol_type, Qlf)));
+ − 4868
+ − 4869 Fmake_coding_system
+ − 4870 (Qundecided, Qundecided,
+ − 4871 build_msg_string ("Undecided (auto-detect)"),
+ − 4872 nconc2 (list4 (Qdocumentation,
+ − 4873 build_msg_string
+ − 4874 ("Automatically detects the correct encoding."),
+ − 4875 Qmnemonic, build_string ("Auto")),
+ − 4876 list6 (Qdo_eol, Qt, Qdo_coding, Qt,
+ − 4877 /* We do EOL detection ourselves so we don't need to be
+ − 4878 wrapped in an EOL detector. (It doesn't actually hurt,
+ − 4879 though, I don't think.) */
+ − 4880 Qeol_type, Qlf)));
+ − 4881
+ − 4882 Fmake_coding_system
+ − 4883 (intern ("undecided-dos"), Qundecided,
+ − 4884 build_msg_string ("Undecided (auto-detect) (CRLF)"),
+ − 4885 nconc2 (list4 (Qdocumentation,
+ − 4886 build_msg_string
+ − 4887 ("Automatically detects the correct encoding; EOL type of CRLF forced."),
+ − 4888 Qmnemonic, build_string ("Auto")),
+ − 4889 list4 (Qdo_coding, Qt,
+ − 4890 Qeol_type, Qcrlf)));
+ − 4891
+ − 4892 Fmake_coding_system
+ − 4893 (intern ("undecided-unix"), Qundecided,
+ − 4894 build_msg_string ("Undecided (auto-detect) (LF)"),
+ − 4895 nconc2 (list4 (Qdocumentation,
+ − 4896 build_msg_string
+ − 4897 ("Automatically detects the correct encoding; EOL type of LF forced."),
+ − 4898 Qmnemonic, build_string ("Auto")),
+ − 4899 list4 (Qdo_coding, Qt,
+ − 4900 Qeol_type, Qlf)));
+ − 4901
+ − 4902 Fmake_coding_system
+ − 4903 (intern ("undecided-mac"), Qundecided,
+ − 4904 build_msg_string ("Undecided (auto-detect) (CR)"),
+ − 4905 nconc2 (list4 (Qdocumentation,
+ − 4906 build_msg_string
+ − 4907 ("Automatically detects the correct encoding; EOL type of CR forced."),
+ − 4908 Qmnemonic, build_string ("Auto")),
+ − 4909 list4 (Qdo_coding, Qt,
+ − 4910 Qeol_type, Qcr)));
+ − 4911
428
+ − 4912 /* Need to create this here or we're really screwed. */
+ − 4913 Fmake_coding_system
+ − 4914 (Qraw_text, Qno_conversion,
771
+ − 4915 build_msg_string ("Raw Text"),
+ − 4916 list4 (Qdocumentation,
+ − 4917 build_msg_string ("Raw text converts only line-break codes, and acts otherwise like `binary'."),
+ − 4918 Qmnemonic, build_string ("Raw")));
428
+ − 4919
+ − 4920 Fmake_coding_system
+ − 4921 (Qbinary, Qno_conversion,
771
+ − 4922 build_msg_string ("Binary"),
+ − 4923 list6 (Qdocumentation,
+ − 4924 build_msg_string (
+ − 4925 "This coding system is as close as it comes to doing no conversion.\n"
+ − 4926 "On input, each byte is converted directly into the character\n"
+ − 4927 "with the corresponding code -- i.e. from the `ascii', `control-1',\n"
+ − 4928 "or `latin-1' character sets. On output, these characters are\n"
+ − 4929 "converted back to the corresponding bytes, and other characters\n"
+ − 4930 "are converted to the default character, i.e. `~'."),
+ − 4931 Qeol_type, Qlf,
428
+ − 4932 Qmnemonic, build_string ("Binary")));
+ − 4933
771
+ − 4934 /* Formerly aliased to raw-text! Completely bogus and not even the same
+ − 4935 as FSF Emacs. */
+ − 4936 Fdefine_coding_system_alias (Qno_conversion, Qbinary);
+ − 4937 Fdefine_coding_system_alias (intern ("no-conversion-unix"),
+ − 4938 intern ("raw-text-unix"));
+ − 4939 Fdefine_coding_system_alias (intern ("no-conversion-dos"),
+ − 4940 intern ("raw-text-dos"));
+ − 4941 Fdefine_coding_system_alias (intern ("no-conversion-mac"),
+ − 4942 intern ("raw-text-mac"));
+ − 4943
+ − 4944 /* These four below will get their defaults set correctly in
+ − 4945 code-init.el. We init them now so we can handle stuff at dump
+ − 4946 time before we get to code-init.el. */
440
+ − 4947 Fdefine_coding_system_alias (Qfile_name, Qbinary);
771
+ − 4948 Fdefine_coding_system_alias (Qnative, Qfile_name);
440
+ − 4949
+ − 4950 Fdefine_coding_system_alias (Qterminal, Qbinary);
+ − 4951 Fdefine_coding_system_alias (Qkeyboard, Qbinary);
+ − 4952
771
+ − 4953 Fdefine_coding_system_alias (Qidentity, Qconvert_eol_lf);
+ − 4954
428
+ − 4955 /* Need this for bootstrapping */
771
+ − 4956 coding_category_system[detector_category_no_conversion] =
428
+ − 4957 Fget_coding_system (Qraw_text);
+ − 4958 }