view lib-src/mmencode.c @ 4690:257b468bf2ca

Move the #'query-coding-region implementation to C. This is necessary because there is no reasonable way to access the corresponding mswindows-multibyte functionality from Lisp, and we need such functionality if we're going to have a reliable and portable #'query-coding-region implementation. However, this change doesn't yet provide #'query-coding-region for the mswindow-multibyte coding systems, there should be no functional differences between an XEmacs with this change and one without it. src/ChangeLog addition: 2009-09-19 Aidan Kehoe <kehoea@parhasard.net> Move the #'query-coding-region implementation to C. This is necessary because there is no reasonable way to access the corresponding mswindows-multibyte functionality from Lisp, and we need such functionality if we're going to have a reliable and portable #'query-coding-region implementation. However, this change doesn't yet provide #'query-coding-region for the mswindow-multibyte coding systems, there should be no functional differences between an XEmacs with this change and one without it. * mule-coding.c (struct fixed_width_coding_system): Add a new coding system type, fixed_width, and implement it. It uses the CCL infrastructure but has a much simpler creation API, and its own query_method, formerly in lisp/mule/mule-coding.el. * unicode.c: Move the Unicode query method implementation here from unicode.el. * lisp.h: Declare Fmake_coding_system_internal, Fcopy_range_table here. * intl-win32.c (complex_vars_of_intl_win32): Use Fmake_coding_system_internal, not Fmake_coding_system. * general-slots.h: Add Qsucceeded, Qunencodable, Qinvalid_sequence here. * file-coding.h (enum coding_system_variant): Add fixed_width_coding_system here. (struct coding_system_methods): Add query_method and query_lstream_method to the coding system methods. Provide flags for the query methods. Declare the default query method; initialise it correctly in INITIALIZE_CODING_SYSTEM_TYPE. * file-coding.c (default_query_method): New function, the default query method for coding systems that do not set it. Moved from coding.el. (make_coding_system_1): Accept new elements in PROPS in #'make-coding-system; aliases, a list of aliases; safe-chars and safe-charsets (these were previously accepted but not saved); and category. (Fmake_coding_system_internal): New function, what used to be #'make-coding-system--on Mule builds, we've now moved some of the functionality of this to Lisp. (Fcoding_system_canonical_name_p): Move this earlier in the file, since it's now called from within make_coding_system_1. (Fquery_coding_region): Move the implementation of this here, from coding.el. (complex_vars_of_file_coding): Call Fmake_coding_system_internal, not Fmake_coding_system; specify safe-charsets properties when we're a mule build. * extents.h (mouse_highlight_priority, Fset_extent_priority, Fset_extent_face, Fmap_extents): Make these available to other C files. lisp/ChangeLog addition: 2009-09-19 Aidan Kehoe <kehoea@parhasard.net> Move the #'query-coding-region implementation to C. * coding.el: Consolidate code that depends on the presence or absence of Mule at the end of this file. (default-query-coding-region, query-coding-region): Move these functions to C. (default-query-coding-region-safe-charset-skip-chars-map): Remove this variable, the corresponding C variable is Vdefault_query_coding_region_chartab_cache in file-coding.c. (query-coding-string): Update docstring to reflect actual multiple values, be more careful about not modifying a range table that we're currently mapping over. (encode-coding-char): Make the implementation of this simpler. (featurep 'mule): Autoload #'make-coding-system from mule/make-coding-system.el if we're a mule build; provide an appropriate compiler macro. Do various non-mule compatibility things if we're not a mule build. * update-elc.el (additional-dump-dependencies): Add mule/make-coding-system as a dump time dependency if we're a mule build. * unicode.el (ccl-encode-to-ucs-2): (decode-char): (encode-char): Move these earlier in the file, for the sake of some byte compile warnings. (unicode-query-coding-region): Move this to unicode.c * mule/make-coding-system.el: New file, not dumped. Contains the functionality to rework the arguments necessary for fixed-width coding systems, and contains the implementation of #'make-coding-system, which now calls #'make-coding-system-internal. * mule/vietnamese.el (viscii): * mule/latin.el (iso-8859-2): (windows-1250): (iso-8859-3): (iso-8859-4): (iso-8859-14): (iso-8859-15): (iso-8859-16): (iso-8859-9): (macintosh): (windows-1252): * mule/hebrew.el (iso-8859-8): * mule/greek.el (iso-8859-7): (windows-1253): * mule/cyrillic.el (iso-8859-5): (koi8-r): (koi8-u): (windows-1251): (alternativnyj): (koi8-ru): (koi8-t): (koi8-c): (koi8-o): * mule/arabic.el (iso-8859-6): (windows-1256): Move all these coding systems to being of type fixed-width, not of type CCL. This allows the distinct query-coding-region for them to be in C, something which will eventually allow us to implement query-coding-region for the mswindows-multibyte coding systems. * mule/general-late.el (posix-charset-to-coding-system-hash): Document why we're pre-emptively persuading the byte compiler that the ELC for this file needs to be written using escape-quoted. Call #'set-unicode-query-skip-chars-args, now the Unicode query-coding-region implementation is in C. * mule/thai-xtis.el (tis-620): Don't bother checking whether we're XEmacs or not here. * mule/mule-coding.el: Move the eight bit fixed-width functionality from this file to make-coding-system.el. tests/ChangeLog addition: 2009-09-19 Aidan Kehoe <kehoea@parhasard.net> * automated/mule-tests.el: Check a coding system's type, not an 8-bit-fixed property, for whether that coding system should be treated as a fixed-width coding system. * automated/query-coding-tests.el: Don't test the query coding functionality for mswindows-multibyte coding systems, it's not yet implemented.
author Aidan Kehoe <kehoea@parhasard.net>
date Sat, 19 Sep 2009 22:53:13 +0100
parents 49316578f12d
children
line wrap: on
line source

/*
Copyright (c) 1991 Bell Communications Research, Inc. (Bellcore)

Permission to use, copy, modify, and distribute this material 
for any purpose and without fee is hereby granted, provided 
that the above copyright notice and this permission notice 
appear in all copies, and that the name of Bellcore not be 
used in advertising or publicity pertaining to this 
material without the specific, prior written permission 
of an authorized representative of Bellcore.  BELLCORE 
MAKES NO REPRESENTATIONS ABOUT THE ACCURACY OR SUITABILITY 
OF THIS MATERIAL FOR ANY PURPOSE.  IT IS PROVIDED "AS IS", 
WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES.
*/

#ifdef HAVE_CONFIG_H
# include <config.h>
#endif
#define NEWLINE_CHAR '\n'
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <errno.h>

static void
output64chunk(int c1, int c2, int c3, int pads, FILE *outfile);

static signed char basis_64[] =
   "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

static signed char index_64[128] = {
    -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1,
    -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1,
    -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,62, -1,-1,-1,63,
    52,53,54,55, 56,57,58,59, 60,61,-1,-1, -1,-1,-1,-1,
    -1, 0, 1, 2,  3, 4, 5, 6,  7, 8, 9,10, 11,12,13,14,
    15,16,17,18, 19,20,21,22, 23,24,25,-1, -1,-1,-1,-1,
    -1,26,27,28, 29,30,31,32, 33,34,35,36, 37,38,39,40,
    41,42,43,44, 45,46,47,48, 49,50,51,-1, -1,-1,-1,-1
};

#define char64(c)  (((c) < 0 || (c) > 127) ? -1 : index_64[(c)])

/*
char64(c)
char c;
{
    char *s = (char *) strchr(basis_64, c);
    if (s) return(s-basis_64);
    return(-1);
}
*/

/* the following gets a character, but fakes it properly into two chars if there's a newline character */
static int InNewline=0;

static int
nextcharin (FILE *infile, int PortableNewlines)
{
    int c;

#ifndef NEWLINE_CHAR
    return(getc(infile));
#else
    if (!PortableNewlines) return(getc(infile));
    if (InNewline) {
        InNewline = 0;
        return(10); /* LF */
    }
    c = getc(infile);
    if (c == NEWLINE_CHAR) {
        InNewline = 1;
        return(13); /* CR */
    }
    return(c);
#endif
}

static void
to64(FILE *infile, FILE *outfile, int PortableNewlines) 
{
    int c1, c2, c3, ct=0;
    InNewline = 0; /* always reset it */
    while ((c1 = nextcharin(infile, PortableNewlines)) != EOF) {
        c2 = nextcharin(infile, PortableNewlines);
        if (c2 == EOF) {
            output64chunk(c1, 0, 0, 2, outfile);
        } else {
            c3 = nextcharin(infile, PortableNewlines);
            if (c3 == EOF) {
                output64chunk(c1, c2, 0, 1, outfile);
            } else {
                output64chunk(c1, c2, c3, 0, outfile);
            }
        }
        ct += 4;
        if (ct > 71) {
            putc('\n', outfile);
            ct = 0;
        }
    }
    if (ct) putc('\n', outfile);
    fflush(outfile);
}

static void
output64chunk(int c1, int c2, int c3, int pads, FILE *outfile)
{
    putc(basis_64[c1>>2], outfile);
    putc(basis_64[((c1 & 0x3)<< 4) | ((c2 & 0xF0) >> 4)], outfile);
    if (pads == 2) {
        putc('=', outfile);
        putc('=', outfile);
    } else if (pads) {
        putc(basis_64[((c2 & 0xF) << 2) | ((c3 & 0xC0) >>6)], outfile);
        putc('=', outfile);
    } else {
        putc(basis_64[((c2 & 0xF) << 2) | ((c3 & 0xC0) >>6)], outfile);
        putc(basis_64[c3 & 0x3F], outfile);
    }
}

static int
PendingBoundary(char *s, char **Boundaries, int *BoundaryCt)
{
    int i, len;

    if (s[0] != '-' || s[1] != '-') return(0);


    for (i=0; i < *BoundaryCt; ++i) {
	len = strlen(Boundaries[i]);
        if (!strncmp(s, Boundaries[i], len)) {
            if (s[len] == '-' && s[len+1] == '-') *BoundaryCt = i;
            return(1);
        }
    }
    return(0);
}

/* If we're in portable newline mode, we have to convert CRLF to the 
    local newline convention on output */

static int CRpending = 0;

#ifdef NEWLINE_CHAR
static void
almostputc(int c, FILE *outfile, int PortableNewlines)
{
    if (CRpending) {
        if (c == 10) {
            putc(NEWLINE_CHAR, outfile);
            CRpending = 0;
        } else {
            putc(13, outfile);
	    if (c != 13) {
            	putc(c, outfile);
		CRpending = 0;
	    }
        }
    } else {
        if (PortableNewlines && c == 13) {
            CRpending = 1;
        } else {
            putc(c, outfile);
        }
    }
}
#else
static void
almostputc(int c, FILE *outfile, int PortableNewlines)
{
    putc(c, outfile);
}
#endif

static void
from64(FILE *infile, FILE *outfile,
       char **boundaries, int *boundaryct, int PortableNewlines) 
{
    int c1, c2, c3, c4;
    int newline = 1, DataDone = 0;

    /* always reinitialize */
    CRpending = 0;
    while ((c1 = getc(infile)) != EOF) {
        if (isspace(c1)) {
            if (c1 == '\n') {
                newline = 1;
            } else {
                newline = 0;
            }
            continue;
        }
        if (newline && boundaries && c1 == '-') {
            char Buf[200];
            /* a dash is NOT base 64, so all bets are off if NOT a boundary */
            ungetc(c1, infile);
            fgets(Buf, sizeof(Buf), infile);
            if (boundaries
                 && (Buf[0] == '-')
                 && (Buf[1] == '-')
                 && PendingBoundary(Buf, boundaries, boundaryct)) {
                return;
            }
            fprintf(stderr, "Ignoring unrecognized boundary line: %s\n", Buf);
            continue;
        }
        if (DataDone) continue;
        newline = 0;
        do {
            c2 = getc(infile);
        } while (c2 != EOF && isspace(c2));
        do {
            c3 = getc(infile);
        } while (c3 != EOF && isspace(c3));
        do {
            c4 = getc(infile);
        } while (c4 != EOF && isspace(c4));
        if (c2 == EOF || c3 == EOF || c4 == EOF) {
            fprintf(stderr, "Warning: base64 decoder saw premature EOF!\n");
            return;
        }
        if (c1 == '=' || c2 == '=') {
            DataDone=1;
            continue;
        }
        c1 = char64(c1);
        c2 = char64(c2);
        almostputc(((c1<<2) | ((c2&0x30)>>4)), outfile, PortableNewlines);
        if (c3 == '=') {
            DataDone = 1;
        } else {
            c3 = char64(c3);
            almostputc((((c2&0XF) << 4) | ((c3&0x3C) >> 2)), outfile, PortableNewlines);
            if (c4 == '=') {
                DataDone = 1;
            } else {
                c4 = char64(c4);
                almostputc((((c3&0x03) <<6) | c4), outfile, PortableNewlines);
            }
        }
    }
    if (CRpending) putc(13, outfile); /* Don't drop a lone trailing char 13 */
}

static signed char basis_hex[] = "0123456789ABCDEF";
static signed char index_hex[128] = {
    -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1,
    -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1,
    -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1,
     0, 1, 2, 3,  4, 5, 6, 7,  8, 9,-1,-1, -1,-1,-1,-1,
    -1,10,11,12, 13,14,15,-1, -1,-1,-1,-1, -1,-1,-1,-1,
    -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1,
    -1,10,11,12, 13,14,15,-1, -1,-1,-1,-1, -1,-1,-1,-1,
    -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1
};

/* The following version generated complaints on Solaris. */
/* #define hexchar(c)  (((c) < 0 || (c) > 127) ? -1 : index_hex[(c)])  */
/*  Since we're no longer ever calling it with anything signed, this should work: */
#define hexchar(c)  (((c) > 127) ? -1 : index_hex[(c)])

/*
hexchar(c)
char c;
{
    char *s;
    if (islower(c)) c = toupper(c);
    s = (char *) strchr(basis_hex, c);
    if (s) return(s-basis_hex);
    return(-1);
}
*/

static void
toqp(FILE *infile, FILE *outfile) 
{
    int c, ct=0, prevc=255;
    while ((c = getc(infile)) != EOF) {
        if ((c < 32 && (c != '\n' && c != '\t'))
             || (c == '=')
             || (c >= 127)
             /* Following line is to avoid single periods alone on lines,
               which messes up some dumb smtp implementations, sigh... */
             || (ct == 0 && c == '.')) {
            putc('=', outfile);
            putc(basis_hex[c>>4], outfile);
            putc(basis_hex[c&0xF], outfile);
            ct += 3;
            prevc = 'A'; /* close enough */
        } else if (c == '\n') {
            if (prevc == ' ' || prevc == '\t') {
                putc('=', outfile); /* soft & hard lines */
                putc(c, outfile);
            }
            putc(c, outfile);
            ct = 0;
            prevc = c;
        } else {
            if (c == 'F' && prevc == '\n') {
                /* HORRIBLE but clever hack suggested by MTR for sendmail-avoidance */
                c = getc(infile);
                if (c == 'r') {
                    c = getc(infile);
                    if (c == 'o') {
                        c = getc(infile);
                        if (c == 'm') {
                            c = getc(infile);
                            if (c == ' ') {
                                /* This is the case we are looking for */
                                fputs("=46rom", outfile);
                                ct += 6;
                            } else {
                                fputs("From", outfile);
                                ct += 4;
                            }
                        } else {
                            fputs("Fro", outfile);
                            ct += 3;
                        }
                    } else {
                        fputs("Fr", outfile);
                        ct += 2;
                    }
                } else {
                    putc('F', outfile);
                    ++ct;
                }
                ungetc(c, infile);
                prevc = 'x'; /* close enough -- printable */
            } else { /* END horrible hack */
                putc(c, outfile);
                ++ct;
                prevc = c;
            }
        }
        if (ct > 72) {
            putc('=', outfile);
            putc('\n', outfile);
            ct = 0;
            prevc = '\n';
        }
    }
    if (ct) {
        putc('=', outfile);
        putc('\n', outfile);
    }
}

static void
fromqp(FILE *infile, FILE *outfile, char **boundaries, int *boundaryct) 
{
    int c1, c2;
    int sawnewline = 1, neednewline = 0;
    /* The neednewline hack is necessary because the newline leading into 
      a multipart boundary is part of the boundary, not the data */

    while ((c1 = getc(infile)) != EOF) {
        if (sawnewline && boundaries && (c1 == '-')) {
            char Buf[200];
            unsigned char *s;

            ungetc(c1, infile);
            fgets(Buf, sizeof(Buf), infile);
            if (boundaries
                 && (Buf[0] == '-')
                 && (Buf[1] == '-')
                 && PendingBoundary(Buf, boundaries, boundaryct)) {
                return;
            }
            /* Not a boundary, now we must treat THIS line as q-p, sigh */
            if (neednewline) {
                putc('\n', outfile);
                neednewline = 0;
            }
            for (s=(unsigned char *) Buf; *s; ++s) {
                if (*s == '=') {
                    if (!*++s) break;
                    if (*s == '\n') {
                        /* ignore it */
                        sawnewline = 1;
                    } else {
                        c1 = hexchar(*s);
                        if (!*++s) break;
                        c2 = hexchar(*s);
                        putc(c1<<4 | c2, outfile);
                    }
                } else {
#ifdef WIN32_NATIVE
                    if (*s == '\n')
                        putc('\r', outfile);	/* insert CR for binary-mode write */
#endif
                    putc(*s, outfile);
                }
            }
        } else {
            if (neednewline) {
                putc('\n', outfile);
                neednewline = 0;
            }
            if (c1 == '=') {
                sawnewline = 0;
                c1 = getc(infile);
                if (c1 == '\n') {
                    /* ignore it */
                    sawnewline = 1;
                } else {
                    c2 = getc(infile);
                    c1 = hexchar(c1);
                    c2 = hexchar(c2);
                    putc(c1<<4 | c2, outfile);
                    if (c2 == '\n') sawnewline = 1;
                }
            } else {
                if (c1 == '\n') {
                    sawnewline = 1;
                    neednewline = 1;
                } else {
                    sawnewline = 0;
                    putc(c1, outfile);
                }
            }
        }
    }
    if (neednewline) {
        putc('\n', outfile);
        neednewline = 0;
    }    
}


/*
Copyright (c) 1991 Bell Communications Research, Inc. (Bellcore)

Permission to use, copy, modify, and distribute this material 
for any purpose and without fee is hereby granted, provided 
that the above copyright notice and this permission notice 
appear in all copies, and that the name of Bellcore not be 
used in advertising or publicity pertaining to this 
material without the specific, prior written permission 
of an authorized representative of Bellcore.  BELLCORE 
MAKES NO REPRESENTATIONS ABOUT THE ACCURACY OR SUITABILITY 
OF THIS MATERIAL FOR ANY PURPOSE.  IT IS PROVIDED "AS IS", 
WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES.
*/
#ifdef WIN32_NATIVE
#include <io.h>
#include <fcntl.h>
#endif

#define BASE64 1
#define QP 2 /* quoted-printable */

int main(int argc, char *argv[])
{
    int encode = 1, which = BASE64, i, portablenewlines = 0;
    FILE *fp = stdin;
    FILE *fpo = stdout;

    for (i=1; i<argc; ++i) {
        if (argv[i][0] == '-') {
	    switch (argv[i][1]) {
		case 'o':
		    if (++i >= argc) {
			fprintf(stderr, "mimencode: -o requires a file name.\n");
			exit(-1);
		    }
		    fpo = fopen(argv[i], "w");
		    if (!fpo) {
			perror(argv[i]);
			exit(-1);
		    }
		    break;
                case 'u':
                    encode = 0;
                    break;
                case 'q':
                    which = QP;
                    break;
                case 'p':
                    portablenewlines = 1;
                    break;
                case 'b':
                    which = BASE64;
                    break;
		default:
                    fprintf(stderr,
                       "Usage: mmencode [-u] [-q] [-b] [-p] [-o outputfile] [file name]\n");
                    exit(-1);
            }
        } else {
#ifdef WIN32_NATIVE
            if (encode)
                fp = fopen(argv[i], "rb");
            else
            {
                fp = fopen(argv[i], "rt");
                setmode(fileno(fpo), O_BINARY);
            } /* else */
#else
            fp = fopen(argv[i], "r");
#endif /* WIN32_NATIVE */
            if (!fp) {
                perror(argv[i]);
                exit(-1);
            }
        }
    }
#ifdef WIN32_NATIVE
    if (fp == stdin) setmode(fileno(fp), O_BINARY);
#endif /* WIN32_NATIVE */
    if (which == BASE64) {
        if (encode) {
            to64(fp, fpo, portablenewlines);
        } else {
            from64(fp,fpo, (char **) NULL, (int *) 0, portablenewlines);
        }
    } else {
        if (encode) toqp(fp, fpo); else fromqp(fp, fpo, NULL, 0);
    }
    return(0);
}