rc2: program/lib/Roundcube/rcube_html2text.php annotate

author	Charlie Root
date	Thu, 04 Jan 2018 15:52:31 -0500
parents
children

rev	line source
0 4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	1 <?php
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	2
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	3 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	4 +-----------------------------------------------------------------------+
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	5 \| This file is part of the Roundcube Webmail client \|
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	6 \| Copyright (C) 2008-2012, The Roundcube Dev Team \|
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	7 \| Copyright (c) 2005-2007, Jon Abernathy <jon@chuggnutt.com> \|
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	8 \| \|
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	9 \| Licensed under the GNU General Public License version 3 or \|
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	10 \| any later version with exceptions for skins & plugins. \|
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	11 \| See the README file for a full license statement. \|
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	12 \| \|
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	13 \| PURPOSE: \|
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	14 \| Converts HTML to formatted plain text (based on html2text class) \|
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	15 +-----------------------------------------------------------------------+
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	16 \| Author: Thomas Bruederli <roundcube@gmail.com> \|
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	17 \| Author: Aleksander Machniak <alec@alec.pl> \|
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	18 \| Author: Jon Abernathy <jon@chuggnutt.com> \|
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	19 +-----------------------------------------------------------------------+
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	20 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	21
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	22 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	23 * Takes HTML and converts it to formatted, plain text.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	24 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	25 * Thanks to Alexander Krug (http://www.krugar.de/) to pointing out and
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	26 * correcting an error in the regexp search array. Fixed 7/30/03.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	27 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	28 * Updated set_html() function's file reading mechanism, 9/25/03.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	29 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	30 * Thanks to Joss Sanglier (http://www.dancingbear.co.uk/) for adding
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	31 * several more HTML entity codes to the $search and $replace arrays.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	32 * Updated 11/7/03.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	33 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	34 * Thanks to Darius Kasperavicius (http://www.dar.dar.lt/) for
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	35 * suggesting the addition of $allowed_tags and its supporting function
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	36 * (which I slightly modified). Updated 3/12/04.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	37 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	38 * Thanks to Justin Dearing for pointing out that a replacement for the
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	39 * <TH> tag was missing, and suggesting an appropriate fix.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	40 * Updated 8/25/04.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	41 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	42 * Thanks to Mathieu Collas (http://www.myefarm.com/) for finding a
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	43 * display/formatting bug in the _build_link_list() function: email
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	44 * readers would show the left bracket and number ("[1") as part of the
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	45 * rendered email address.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	46 * Updated 12/16/04.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	47 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	48 * Thanks to Wojciech Bajon (http://histeria.pl/) for submitting code
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	49 * to handle relative links, which I hadn't considered. I modified his
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	50 * code a bit to handle normal HTTP links and MAILTO links. Also for
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	51 * suggesting three additional HTML entity codes to search for.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	52 * Updated 03/02/05.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	53 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	54 * Thanks to Jacob Chandler for pointing out another link condition
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	55 * for the _build_link_list() function: "https".
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	56 * Updated 04/06/05.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	57 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	58 * Thanks to Marc Bertrand (http://www.dresdensky.com/) for
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	59 * suggesting a revision to the word wrapping functionality; if you
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	60 * specify a $width of 0 or less, word wrapping will be ignored.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	61 * Updated 11/02/06.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	62 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	63 * *** Big housecleaning updates below:
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	64 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	65 * Thanks to Colin Brown (http://www.sparkdriver.co.uk/) for
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	66 * suggesting the fix to handle </li> and blank lines (whitespace).
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	67 * Christian Basedau (http://www.movetheweb.de/) also suggested the
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	68 * blank lines fix.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	69 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	70 * Special thanks to Marcus Bointon (http://www.synchromedia.co.uk/),
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	71 * Christian Basedau, Norbert Laposa (http://ln5.co.uk/),
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	72 * Bas van de Weijer, and Marijn van Butselaar
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	73 * for pointing out my glaring error in the <th> handling. Marcus also
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	74 * supplied a host of fixes.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	75 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	76 * Thanks to Jeffrey Silverman (http://www.newtnotes.com/) for pointing
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	77 * out that extra spaces should be compressed--a problem addressed with
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	78 * Marcus Bointon's fixes but that I had not yet incorporated.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	79 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	80 * Thanks to Daniel Schledermann (http://www.typoconsult.dk/) for
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	81 * suggesting a valuable fix with <a> tag handling.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	82 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	83 * Thanks to Wojciech Bajon (again!) for suggesting fixes and additions,
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	84 * including the <a> tag handling that Daniel Schledermann pointed
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	85 * out but that I had not yet incorporated. I haven't (yet)
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	86 * incorporated all of Wojciech's changes, though I may at some
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	87 * future time.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	88 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	89 * *** End of the housecleaning updates. Updated 08/08/07.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	90 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	91
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	92 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	93 * Converts HTML to formatted plain text
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	94 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	95 * @package Framework
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	96 * @subpackage Utils
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	97 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	98 class rcube_html2text
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	99 {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	100 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	101 * Contains the HTML content to convert.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	102 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	103 * @var string $html
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	104 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	105 protected $html;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	106
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	107 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	108 * Contains the converted, formatted text.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	109 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	110 * @var string $text
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	111 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	112 protected $text;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	113
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	114 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	115 * Maximum width of the formatted text, in columns.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	116 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	117 * Set this value to 0 (or less) to ignore word wrapping
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	118 * and not constrain text to a fixed-width column.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	119 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	120 * @var integer $width
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	121 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	122 protected $width = 70;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	123
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	124 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	125 * Target character encoding for output text
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	126 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	127 * @var string $charset
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	128 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	129 protected $charset = 'UTF-8';
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	130
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	131 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	132 * List of preg* regular expression patterns to search for,
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	133 * used in conjunction with $replace.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	134 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	135 * @var array $search
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	136 * @see $replace
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	137 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	138 protected $search = array(
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	139 '/\r/', // Non-legal carriage return
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	140 '/^.<body[^>]>\n*/is', // Anything before <body>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	141 '/<head[^>]>.?<\/head>/is', // <head>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	142 '/<script[^>]>.?<\/script>/is', // <script>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	143 '/<style[^>]>.?<\/style>/is', // <style>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	144 '/[\n\t]+/', // Newlines and tabs
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	145 '/<p[^>]*>/i', // <p>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	146 '/<\/p>[\s\n\t]<div[^>]>/i', // </p> before <div>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	147 '/<br[^>]>[\s\n\t]<div[^>]*>/i', // <br> before <div>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	148 '/<br[^>]>\s/i', // <br>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	149 '/<i[^>]>(.?)<\/i>/i', // <i>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	150 '/<em[^>]>(.?)<\/em>/i', // <em>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	151 '/(<ul[^>]*>\|<\/ul>)/i', // <ul> and </ul>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	152 '/(<ol[^>]*>\|<\/ol>)/i', // <ol> and </ol>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	153 '/<li[^>]>(.?)<\/li>/i', // <li> and </li>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	154 '/<li[^>]*>/i', // <li>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	155 '/<hr[^>]*>/i', // <hr>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	156 '/<div[^>]*>/i', // <div>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	157 '/(<table[^>]*>\|<\/table>)/i', // <table> and </table>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	158 '/(<tr[^>]*>\|<\/tr>)/i', // <tr> and </tr>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	159 '/<td[^>]>(.?)<\/td>/i', // <td> and </td>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	160 );
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	161
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	162 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	163 * List of pattern replacements corresponding to patterns searched.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	164 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	165 * @var array $replace
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	166 * @see $search
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	167 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	168 protected $replace = array(
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	169 '', // Non-legal carriage return
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	170 '', // Anything before <body>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	171 '', // <head>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	172 '', // <script>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	173 '', // <style>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	174 ' ', // Newlines and tabs
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	175 "\n\n", // <p>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	176 "\n<div>", // </p> before <div>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	177 '<div>', // <br> before <div>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	178 "\n", // <br>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	179 '_\\1_', // <i>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	180 '_\\1_', // <em>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	181 "\n\n", // <ul> and </ul>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	182 "\n\n", // <ol> and </ol>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	183 "\t* \\1\n", // <li> and </li>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	184 "\n\t* ", // <li>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	185 "\n-------------------------\n", // <hr>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	186 "<div>\n", // <div>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	187 "\n\n", // <table> and </table>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	188 "\n", // <tr> and </tr>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	189 "\t\t\\1\n", // <td> and </td>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	190 );
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	191
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	192 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	193 * List of preg* regular expression patterns to search for,
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	194 * used in conjunction with $ent_replace.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	195 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	196 * @var array $ent_search
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	197 * @see $ent_replace
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	198 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	199 protected $ent_search = array(
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	200 '/&(nbsp\|#160);/i', // Non-breaking space
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	201 '/&(quot\|rdquo\|ldquo\|#8220\|#8221\|#147\|#148);/i',
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	202 // Double quotes
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	203 '/&(apos\|rsquo\|lsquo\|#8216\|#8217);/i', // Single quotes
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	204 '/>/i', // Greater-than
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	205 '/</i', // Less-than
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	206 '/&(copy\|#169);/i', // Copyright
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	207 '/&(trade\|#8482\|#153);/i', // Trademark
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	208 '/&(reg\|#174);/i', // Registered
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	209 '/&(mdash\|#151\|#8212);/i', // mdash
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	210 '/&(ndash\|minus\|#8211\|#8722);/i', // ndash
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	211 '/&(bull\|#149\|#8226);/i', // Bullet
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	212 '/&(pound\|#163);/i', // Pound sign
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	213 '/&(euro\|#8364);/i', // Euro sign
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	214 '/&(amp\|#38);/i', // Ampersand: see _converter()
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	215 '/[ ]{2,}/', // Runs of spaces, post-handling
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	216 );
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	217
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	218 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	219 * List of pattern replacements corresponding to patterns searched.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	220 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	221 * @var array $ent_replace
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	222 * @see $ent_search
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	223 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	224 protected $ent_replace = array(
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	225 "\xC2\xA0", // Non-breaking space
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	226 '"', // Double quotes
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	227 "'", // Single quotes
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	228 '>',
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	229 '<',
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	230 '(c)',
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	231 '(tm)',
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	232 '(R)',
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	233 '--',
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	234 '-',
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	235 '*',
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	236 '£',
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	237 'EUR', // Euro sign. €
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	238 '\|+\|amp\|+\|', // Ampersand: see _converter()
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	239 ' ', // Runs of spaces, post-handling
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	240 );
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	241
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	242 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	243 * List of preg* regular expression patterns to search for
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	244 * and replace using callback function.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	245 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	246 * @var array $callback_search
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	247 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	248 protected $callback_search = array(
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	249 '/<(a) [^>]href=("\|\')([^"\']+)\2[^>]>(.*?)<\/a>/i', // <a href="">
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	250 '/<(h)[123456]( [^>])?>(.?)<\/h[123456]>/i', // h1 - h6
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	251 '/<(b)( [^>])?>(.?)<\/b>/i', // <b>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	252 '/<(strong)( [^>])?>(.?)<\/strong>/i', // <strong>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	253 '/<(th)( [^>])?>(.?)<\/th>/i', // <th> and </th>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	254 );
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	255
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	256 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	257 * List of preg* regular expression patterns to search for in PRE body,
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	258 * used in conjunction with $pre_replace.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	259 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	260 * @var array $pre_search
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	261 * @see $pre_replace
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	262 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	263 protected $pre_search = array(
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	264 "/\n/",
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	265 "/\t/",
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	266 '/ /',
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	267 '/<pre[^>]*>/',
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	268 '/<\/pre>/'
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	269 );
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	270
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	271 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	272 * List of pattern replacements corresponding to patterns searched for PRE body.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	273 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	274 * @var array $pre_replace
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	275 * @see $pre_search
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	276 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	277 protected $pre_replace = array(
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	278 '<br>',
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	279 '    ',
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	280 ' ',
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	281 '',
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	282 ''
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	283 );
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	284
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	285 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	286 * Contains a list of HTML tags to allow in the resulting text.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	287 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	288 * @var string $allowed_tags
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	289 * @see set_allowed_tags()
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	290 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	291 protected $allowed_tags = '';
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	292
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	293 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	294 * Contains the base URL that relative links should resolve to.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	295 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	296 * @var string $url
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	297 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	298 protected $url;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	299
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	300 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	301 * Indicates whether content in the $html variable has been converted yet.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	302 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	303 * @var boolean $_converted
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	304 * @see $html, $text
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	305 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	306 protected $_converted = false;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	307
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	308 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	309 * Contains URL addresses from links to be rendered in plain text.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	310 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	311 * @var array $_link_list
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	312 * @see _build_link_list()
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	313 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	314 protected $_link_list = array();
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	315
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	316 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	317 * Boolean flag, true if a table of link URLs should be listed after the text.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	318 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	319 * @var boolean $_do_links
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	320 * @see __construct()
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	321 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	322 protected $_do_links = true;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	323
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	324 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	325 * Constructor.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	326 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	327 * If the HTML source string (or file) is supplied, the class
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	328 * will instantiate with that source propagated, all that has
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	329 * to be done it to call get_text().
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	330 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	331 * @param string $source HTML content
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	332 * @param boolean $from_file Indicates $source is a file to pull content from
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	333 * @param boolean $do_links Indicate whether a table of link URLs is desired
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	334 * @param integer $width Maximum width of the formatted text, 0 for no limit
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	335 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	336 function __construct($source = '', $from_file = false, $do_links = true, $width = 75, $charset = 'UTF-8')
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	337 {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	338 if (!empty($source)) {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	339 $this->set_html($source, $from_file);
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	340 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	341
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	342 $this->set_base_url();
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	343
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	344 $this->_do_links = $do_links;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	345 $this->width = $width;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	346 $this->charset = $charset;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	347 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	348
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	349 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	350 * Loads source HTML into memory, either from $source string or a file.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	351 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	352 * @param string $source HTML content
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	353 * @param boolean $from_file Indicates $source is a file to pull content from
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	354 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	355 function set_html($source, $from_file = false)
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	356 {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	357 if ($from_file && file_exists($source)) {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	358 $this->html = file_get_contents($source);
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	359 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	360 else {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	361 $this->html = $source;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	362 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	363
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	364 $this->_converted = false;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	365 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	366
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	367 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	368 * Returns the text, converted from HTML.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	369 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	370 * @return string Plain text
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	371 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	372 function get_text()
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	373 {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	374 if (!$this->_converted) {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	375 $this->_convert();
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	376 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	377
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	378 return $this->text;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	379 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	380
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	381 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	382 * Prints the text, converted from HTML.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	383 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	384 function print_text()
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	385 {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	386 print $this->get_text();
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	387 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	388
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	389 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	390 * Sets the allowed HTML tags to pass through to the resulting text.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	391 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	392 * Tags should be in the form "<p>", with no corresponding closing tag.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	393 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	394 function set_allowed_tags($allowed_tags = '')
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	395 {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	396 if (!empty($allowed_tags)) {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	397 $this->allowed_tags = $allowed_tags;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	398 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	399 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	400
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	401 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	402 * Sets a base URL to handle relative links.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	403 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	404 function set_base_url($url = '')
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	405 {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	406 if (empty($url)) {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	407 if (!empty($_SERVER['HTTP_HOST'])) {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	408 $this->url = 'http://' . $_SERVER['HTTP_HOST'];
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	409 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	410 else {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	411 $this->url = '';
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	412 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	413 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	414 else {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	415 // Strip any trailing slashes for consistency (relative
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	416 // URLs may already start with a slash like "/file.html")
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	417 if (substr($url, -1) == '/') {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	418 $url = substr($url, 0, -1);
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	419 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	420 $this->url = $url;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	421 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	422 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	423
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	424 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	425 * Workhorse function that does actual conversion (calls _converter() method).
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	426 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	427 protected function _convert()
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	428 {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	429 // Variables used for building the link list
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	430 $this->_link_list = array();
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	431
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	432 $text = $this->html;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	433
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	434 // Convert HTML to TXT
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	435 $this->_converter($text);
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	436
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	437 // Add link list
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	438 if (!empty($this->_link_list)) {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	439 $text .= "\n\nLinks:\n------\n";
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	440 foreach ($this->_link_list as $idx => $url) {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	441 $text .= '[' . ($idx+1) . '] ' . $url . "\n";
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	442 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	443 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	444
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	445 $this->text = $text;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	446 $this->_converted = true;
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	447 }
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	448
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	449 /**
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	450 * Workhorse function that does actual conversion.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	451 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	452 * First performs custom tag replacement specified by $search and
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	453 * $replace arrays. Then strips any remaining HTML tags, reduces whitespace
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	454 * and newlines to a readable format, and word wraps the text to
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	455 * $width characters.
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	456 *
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	457 * @param string &$text Reference to HTML content string
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	458 */
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	459 protected function _converter(&$text)
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	460 {
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	461 // Convert <BLOCKQUOTE> (before PRE!)
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	462 $this->_convert_blockquotes($text);
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	463
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	464 // Convert <PRE>
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	465 $this->_convert_pre($text);
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	466
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	467 // Run our defined tags search-and-replace
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	468 $text = preg_replace($this->search, $this->replace, $text);
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	469
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	470 // Run our defined tags search-and-replace with callback
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	471 $text = preg_replace_callback($this->callback_search, array($this, 'tags_preg_callback'), $text);
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	472
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	473 // Strip any other HTML tags
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	474 $text = strip_tags($text, $this->allowed_tags);
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	475
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	476 // Run our defined entities/characters search-and-replace
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	477 $text = preg_replace($this->ent_search, $this->ent_replace, $text);
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	478
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	479 // Replace known html entities
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	480 $text = html_entity_decode($text, ENT_QUOTES, $this->charset);
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	481
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	482 // Replace unicode nbsp to regular spaces
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	483 $text = preg_replace('/\xC2\xA0/', ' ', $text);
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	484
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	485 // Remove unknown/unhandled entities (this cannot be done in search-and-replace block)
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	486 $text = preg_replace('/&([a-zA-Z0-9]{2,6}\|#[0-9]{2,4});/', '', $text);
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	487
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	488 // Convert "\|+\|amp\|+\|" into "&", need to be done after handling of unknown entities
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	489 // This properly handles situation of "&quot;" in input string
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	490 $text = str_replace('\|+\|amp\|+\|', '&', $text);
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	491
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	492 // Bring down number of empty lines to 2 max
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	493 $text = preg_replace("/\n\s+\n/", "\n\n", $text);
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	494 $text = preg_replace("/[\n]{3,}/", "\n\n", $text);
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	495
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	496 // remove leading empty lines (can be produced by eg. P tag on the beginning)
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	497 $text = ltrim($text, "\n");
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	498
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	499 // Wrap the text to a readable format
4681f974d28b vanilla 1.3.3 distro, I hope Charlie Root parents: diff changeset	500 // for PHP versions >= 4.0.2. Default width is 75

0

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

1 <?php

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

2

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

3 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

4 +-----------------------------------------------------------------------+

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

5 | This file is part of the Roundcube Webmail client |

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

8 | |

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

9 | Licensed under the GNU General Public License version 3 or |

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

10 | any later version with exceptions for skins & plugins. |

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

11 | See the README file for a full license statement. |

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

12 | |

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

13 | PURPOSE: |

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

14 | Converts HTML to formatted plain text (based on html2text class) |

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

15 +-----------------------------------------------------------------------+

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

16 | Author: Thomas Bruederli <roundcube@gmail.com> |

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

17 | Author: Aleksander Machniak <alec@alec.pl> |

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

18 | Author: Jon Abernathy <jon@chuggnutt.com> |

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

19 +-----------------------------------------------------------------------+

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

20 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

21

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

22 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

23 * Takes HTML and converts it to formatted, plain text.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

24 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

25 * Thanks to Alexander Krug (http://www.krugar.de/) to pointing out and

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

26 * correcting an error in the regexp search array. Fixed 7/30/03.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

27 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

28 * Updated set_html() function's file reading mechanism, 9/25/03.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

29 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

30 * Thanks to Joss Sanglier (http://www.dancingbear.co.uk/) for adding

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

31 * several more HTML entity codes to the $search and $replace arrays.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

32 * Updated 11/7/03.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

33 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

34 * Thanks to Darius Kasperavicius (http://www.dar.dar.lt/) for

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

35 * suggesting the addition of $allowed_tags and its supporting function

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

36 * (which I slightly modified). Updated 3/12/04.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

37 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

38 * Thanks to Justin Dearing for pointing out that a replacement for the

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

39 * <TH> tag was missing, and suggesting an appropriate fix.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

40 * Updated 8/25/04.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

41 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

42 * Thanks to Mathieu Collas (http://www.myefarm.com/) for finding a

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

43 * display/formatting bug in the _build_link_list() function: email

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

44 * readers would show the left bracket and number ("[1") as part of the

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

45 * rendered email address.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

46 * Updated 12/16/04.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

47 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

48 * Thanks to Wojciech Bajon (http://histeria.pl/) for submitting code

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

49 * to handle relative links, which I hadn't considered. I modified his

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

50 * code a bit to handle normal HTTP links and MAILTO links. Also for

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

51 * suggesting three additional HTML entity codes to search for.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

52 * Updated 03/02/05.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

53 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

54 * Thanks to Jacob Chandler for pointing out another link condition

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

55 * for the _build_link_list() function: "https".

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

56 * Updated 04/06/05.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

57 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

58 * Thanks to Marc Bertrand (http://www.dresdensky.com/) for

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

59 * suggesting a revision to the word wrapping functionality; if you

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

60 * specify a $width of 0 or less, word wrapping will be ignored.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

61 * Updated 11/02/06.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

62 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

63 * *** Big housecleaning updates below:

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

64 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

65 * Thanks to Colin Brown (http://www.sparkdriver.co.uk/) for

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

66 * suggesting the fix to handle </li> and blank lines (whitespace).

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

67 * Christian Basedau (http://www.movetheweb.de/) also suggested the

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

68 * blank lines fix.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

69 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

70 * Special thanks to Marcus Bointon (http://www.synchromedia.co.uk/),

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

71 * Christian Basedau, Norbert Laposa (http://ln5.co.uk/),

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

72 * Bas van de Weijer, and Marijn van Butselaar

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

73 * for pointing out my glaring error in the <th> handling. Marcus also

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

74 * supplied a host of fixes.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

75 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

76 * Thanks to Jeffrey Silverman (http://www.newtnotes.com/) for pointing

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

77 * out that extra spaces should be compressed--a problem addressed with

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

78 * Marcus Bointon's fixes but that I had not yet incorporated.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

79 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

80 * Thanks to Daniel Schledermann (http://www.typoconsult.dk/) for

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

81 * suggesting a valuable fix with <a> tag handling.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

82 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

83 * Thanks to Wojciech Bajon (again!) for suggesting fixes and additions,

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

84 * including the <a> tag handling that Daniel Schledermann pointed

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

85 * out but that I had not yet incorporated. I haven't (yet)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

86 * incorporated all of Wojciech's changes, though I may at some

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

87 * future time.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

88 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

89 * *** End of the housecleaning updates. Updated 08/08/07.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

90 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

91

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

92 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

93 * Converts HTML to formatted plain text

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

94 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

95 * @package Framework

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

96 * @subpackage Utils

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

97 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

98 class rcube_html2text

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

99 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

100 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

101 * Contains the HTML content to convert.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

102 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

103 * @var string $html

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

104 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

105 protected $html;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

106

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

107 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

108 * Contains the converted, formatted text.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

109 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

110 * @var string $text

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

111 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

112 protected $text;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

113

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

114 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

115 * Maximum width of the formatted text, in columns.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

116 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

117 * Set this value to 0 (or less) to ignore word wrapping

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

118 * and not constrain text to a fixed-width column.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

119 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

120 * @var integer $width

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

121 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

122 protected $width = 70;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

123

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

124 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

125 * Target character encoding for output text

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

126 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

127 * @var string $charset

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

128 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

129 protected $charset = 'UTF-8';

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

130

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

131 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

132 * List of preg* regular expression patterns to search for,

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

133 * used in conjunction with $replace.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

134 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

135 * @var array $search

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

136 * @see $replace

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

137 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

138 protected $search = array(

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

139 '/\r/', // Non-legal carriage return

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

140 '/^.*<body[^>]*>\n*/is', // Anything before <body>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

141 '/<head[^>]*>.*?<\/head>/is', // <head>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

142 '/<script[^>]*>.*?<\/script>/is', // <script>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

143 '/<style[^>]*>.*?<\/style>/is', // <style>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

144 '/[\n\t]+/', // Newlines and tabs

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

145 '/<p[^>]*>/i', // <p>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

146 '/<\/p>[\s\n\t]*<div[^>]*>/i', // </p> before <div>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

147 '/<br[^>]*>[\s\n\t]*<div[^>]*>/i', // <br> before <div>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

148 '/<br[^>]*>\s*/i', // <br>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

149 '/<i[^>]*>(.*?)<\/i>/i', // <i>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

150 '/<em[^>]*>(.*?)<\/em>/i', // <em>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

151 '/(<ul[^>]*>|<\/ul>)/i', // <ul> and </ul>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

152 '/(<ol[^>]*>|<\/ol>)/i', // <ol> and </ol>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

153 '/<li[^>]*>(.*?)<\/li>/i', // <li> and </li>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

154 '/<li[^>]*>/i', // <li>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

155 '/<hr[^>]*>/i', // <hr>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

156 '/<div[^>]*>/i', // <div>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

157 '/(<table[^>]*>|<\/table>)/i', // <table> and </table>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

158 '/(<tr[^>]*>|<\/tr>)/i', // <tr> and </tr>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

159 '/<td[^>]*>(.*?)<\/td>/i', // <td> and </td>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

160 );

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

161

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

162 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

163 * List of pattern replacements corresponding to patterns searched.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

164 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

165 * @var array $replace

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

166 * @see $search

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

167 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

168 protected $replace = array(

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

169 '', // Non-legal carriage return

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

170 '', // Anything before <body>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

171 '', // <head>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

172 '', // <script>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

173 '', // <style>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

174 ' ', // Newlines and tabs

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

175 "\n\n", // <p>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

176 "\n<div>", // </p> before <div>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

177 '<div>', // <br> before <div>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

178 "\n", // <br>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

179 '_\\1_', // <i>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

180 '_\\1_', // <em>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

181 "\n\n", // <ul> and </ul>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

182 "\n\n", // <ol> and </ol>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

183 "\t* \\1\n", // <li> and </li>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

184 "\n\t* ", // <li>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

185 "\n-------------------------\n", // <hr>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

186 "<div>\n", // <div>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

187 "\n\n", // <table> and </table>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

188 "\n", // <tr> and </tr>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

189 "\t\t\\1\n", // <td> and </td>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

190 );

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

191

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

192 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

193 * List of preg* regular expression patterns to search for,

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

194 * used in conjunction with $ent_replace.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

195 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

196 * @var array $ent_search

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

197 * @see $ent_replace

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

198 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

199 protected $ent_search = array(

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

200 '/&(nbsp|#160);/i', // Non-breaking space

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

201 '/&(quot|rdquo|ldquo|#8220|#8221|#147|#148);/i',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

202 // Double quotes

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

203 '/&(apos|rsquo|lsquo|#8216|#8217);/i', // Single quotes

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

204 '/>/i', // Greater-than

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

205 '/</i', // Less-than

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

206 '/&(copy|#169);/i', // Copyright

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

207 '/&(trade|#8482|#153);/i', // Trademark

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

208 '/&(reg|#174);/i', // Registered

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

209 '/&(mdash|#151|#8212);/i', // mdash

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

210 '/&(ndash|minus|#8211|#8722);/i', // ndash

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

211 '/&(bull|#149|#8226);/i', // Bullet

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

212 '/&(pound|#163);/i', // Pound sign

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

213 '/&(euro|#8364);/i', // Euro sign

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

214 '/&(amp|#38);/i', // Ampersand: see _converter()

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

215 '/[ ]{2,}/', // Runs of spaces, post-handling

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

216 );

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

217

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

218 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

219 * List of pattern replacements corresponding to patterns searched.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

220 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

221 * @var array $ent_replace

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

222 * @see $ent_search

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

223 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

224 protected $ent_replace = array(

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

225 "\xC2\xA0", // Non-breaking space

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

226 '"', // Double quotes

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

227 "'", // Single quotes

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

228 '>',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

229 '<',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

230 '(c)',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

231 '(tm)',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

232 '(R)',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

233 '--',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

234 '-',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

235 '*',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

236 '£',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

237 'EUR', // Euro sign. €

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

238 '|+|amp|+|', // Ampersand: see _converter()

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

239 ' ', // Runs of spaces, post-handling

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

240 );

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

241

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

242 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

243 * List of preg* regular expression patterns to search for

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

244 * and replace using callback function.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

245 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

246 * @var array $callback_search

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

247 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

248 protected $callback_search = array(

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

249 '/<(a) [^>]*href=("|\')([^"\']+)\2[^>]*>(.*?)<\/a>/i', // <a href="">

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

250 '/<(h)[123456]( [^>]*)?>(.*?)<\/h[123456]>/i', // h1 - h6

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

251 '/<(b)( [^>]*)?>(.*?)<\/b>/i', // <b>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

252 '/<(strong)( [^>]*)?>(.*?)<\/strong>/i', // <strong>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

253 '/<(th)( [^>]*)?>(.*?)<\/th>/i', // <th> and </th>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

254 );

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

255

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

256 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

257 * List of preg* regular expression patterns to search for in PRE body,

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

258 * used in conjunction with $pre_replace.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

259 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

260 * @var array $pre_search

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

261 * @see $pre_replace

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

262 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

263 protected $pre_search = array(

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

264 "/\n/",

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

265 "/\t/",

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

266 '/ /',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

267 '/<pre[^>]*>/',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

268 '/<\/pre>/'

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

269 );

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

270

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

271 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

272 * List of pattern replacements corresponding to patterns searched for PRE body.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

273 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

274 * @var array $pre_replace

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

275 * @see $pre_search

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

276 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

277 protected $pre_replace = array(

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

278 '<br>',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

279 '    ',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

280 ' ',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

281 '',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

282 ''

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

283 );

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

284

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

285 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

286 * Contains a list of HTML tags to allow in the resulting text.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

287 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

288 * @var string $allowed_tags

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

289 * @see set_allowed_tags()

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

290 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

291 protected $allowed_tags = '';

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

292

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

293 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

294 * Contains the base URL that relative links should resolve to.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

295 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

296 * @var string $url

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

297 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

298 protected $url;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

299

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

300 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

301 * Indicates whether content in the $html variable has been converted yet.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

302 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

303 * @var boolean $_converted

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

304 * @see $html, $text

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

305 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

306 protected $_converted = false;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

307

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

308 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

309 * Contains URL addresses from links to be rendered in plain text.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

310 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

311 * @var array $_link_list

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

312 * @see _build_link_list()

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

313 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

314 protected $_link_list = array();

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

315

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

316 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

317 * Boolean flag, true if a table of link URLs should be listed after the text.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

318 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

319 * @var boolean $_do_links

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

320 * @see __construct()

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

321 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

322 protected $_do_links = true;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

323

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

324 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

325 * Constructor.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

326 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

327 * If the HTML source string (or file) is supplied, the class

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

328 * will instantiate with that source propagated, all that has

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

329 * to be done it to call get_text().

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

330 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

331 * @param string $source HTML content

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

332 * @param boolean $from_file Indicates $source is a file to pull content from

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

333 * @param boolean $do_links Indicate whether a table of link URLs is desired

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

334 * @param integer $width Maximum width of the formatted text, 0 for no limit

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

335 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

336 function __construct($source = '', $from_file = false, $do_links = true, $width = 75, $charset = 'UTF-8')

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

337 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

338 if (!empty($source)) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

339 $this->set_html($source, $from_file);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

340 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

341

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

342 $this->set_base_url();

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

343

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

344 $this->_do_links = $do_links;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

345 $this->width = $width;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

346 $this->charset = $charset;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

347 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

348

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

349 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

350 * Loads source HTML into memory, either from $source string or a file.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

351 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

352 * @param string $source HTML content

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

353 * @param boolean $from_file Indicates $source is a file to pull content from

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

354 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

355 function set_html($source, $from_file = false)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

356 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

357 if ($from_file && file_exists($source)) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

358 $this->html = file_get_contents($source);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

359 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

360 else {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

361 $this->html = $source;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

362 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

363

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

364 $this->_converted = false;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

365 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

366

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

367 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

368 * Returns the text, converted from HTML.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

369 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

370 * @return string Plain text

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

371 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

372 function get_text()

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

373 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

374 if (!$this->_converted) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

375 $this->_convert();

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

376 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

377

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

378 return $this->text;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

379 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

380

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

381 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

382 * Prints the text, converted from HTML.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

383 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

384 function print_text()

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

385 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

386 print $this->get_text();

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

387 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

388

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

389 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

390 * Sets the allowed HTML tags to pass through to the resulting text.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

391 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

392 * Tags should be in the form "<p>", with no corresponding closing tag.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

393 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

394 function set_allowed_tags($allowed_tags = '')

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

395 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

396 if (!empty($allowed_tags)) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

397 $this->allowed_tags = $allowed_tags;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

398 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

399 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

400

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

401 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

402 * Sets a base URL to handle relative links.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

403 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

404 function set_base_url($url = '')

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

405 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

406 if (empty($url)) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

407 if (!empty($_SERVER['HTTP_HOST'])) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

408 $this->url = 'http://' . $_SERVER['HTTP_HOST'];

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

409 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

410 else {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

411 $this->url = '';

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

412 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

413 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

414 else {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

415 // Strip any trailing slashes for consistency (relative

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

416 // URLs may already start with a slash like "/file.html")

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

417 if (substr($url, -1) == '/') {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

418 $url = substr($url, 0, -1);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

419 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

420 $this->url = $url;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

421 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

422 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

423

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

424 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

425 * Workhorse function that does actual conversion (calls _converter() method).

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

426 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

427 protected function _convert()

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

428 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

429 // Variables used for building the link list

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

430 $this->_link_list = array();

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

431

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

432 $text = $this->html;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

433

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

434 // Convert HTML to TXT

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

435 $this->_converter($text);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

436

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

437 // Add link list

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

438 if (!empty($this->_link_list)) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

439 $text .= "\n\nLinks:\n------\n";

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

440 foreach ($this->_link_list as $idx => $url) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

441 $text .= '[' . ($idx+1) . '] ' . $url . "\n";

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

442 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

443 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

444

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

445 $this->text = $text;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

446 $this->_converted = true;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

447 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

448

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

449 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

450 * Workhorse function that does actual conversion.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

451 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

452 * First performs custom tag replacement specified by $search and

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

453 * $replace arrays. Then strips any remaining HTML tags, reduces whitespace

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

454 * and newlines to a readable format, and word wraps the text to

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

455 * $width characters.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

456 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

457 * @param string &$text Reference to HTML content string

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

458 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

459 protected function _converter(&$text)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

460 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

461 // Convert <BLOCKQUOTE> (before PRE!)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

462 $this->_convert_blockquotes($text);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

463

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

464 // Convert <PRE>

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

465 $this->_convert_pre($text);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

466

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

467 // Run our defined tags search-and-replace

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

468 $text = preg_replace($this->search, $this->replace, $text);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

469

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

470 // Run our defined tags search-and-replace with callback

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

471 $text = preg_replace_callback($this->callback_search, array($this, 'tags_preg_callback'), $text);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

472

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

473 // Strip any other HTML tags

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

474 $text = strip_tags($text, $this->allowed_tags);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

475

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

476 // Run our defined entities/characters search-and-replace

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

477 $text = preg_replace($this->ent_search, $this->ent_replace, $text);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

478

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

479 // Replace known html entities

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

480 $text = html_entity_decode($text, ENT_QUOTES, $this->charset);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

481

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

482 // Replace unicode nbsp to regular spaces

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

483 $text = preg_replace('/\xC2\xA0/', ' ', $text);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

484

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

485 // Remove unknown/unhandled entities (this cannot be done in search-and-replace block)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

486 $text = preg_replace('/&([a-zA-Z0-9]{2,6}|#[0-9]{2,4});/', '', $text);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

487

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

488 // Convert "|+|amp|+|" into "&", need to be done after handling of unknown entities

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

489 // This properly handles situation of "&quot;" in input string

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

490 $text = str_replace('|+|amp|+|', '&', $text);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

491

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

492 // Bring down number of empty lines to 2 max

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

493 $text = preg_replace("/\n\s+\n/", "\n\n", $text);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

494 $text = preg_replace("/[\n]{3,}/", "\n\n", $text);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

495

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

496 // remove leading empty lines (can be produced by eg. P tag on the beginning)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

497 $text = ltrim($text, "\n");

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

498

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

499 // Wrap the text to a readable format

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

500 // for PHP versions >= 4.0.2. Default width is 75

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

501 // If width is 0 or less, don't wrap the text.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

502 if ( $this->width > 0 ) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

503 $text = wordwrap($text, $this->width);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

504 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

505 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

506

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

507 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

508 * Helper function called by preg_replace() on link replacement.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

509 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

510 * Maintains an internal list of links to be displayed at the end of the

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

511 * text, with numeric indices to the original point in the text they

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

512 * appeared. Also makes an effort at identifying and handling absolute

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

513 * and relative links.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

514 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

515 * @param string $link URL of the link

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

516 * @param string $display Part of the text to associate number with

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

517 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

518 protected function _build_link_list($link, $display)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

519 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

520 if (!$this->_do_links || empty($link)) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

521 return $display;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

522 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

523

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

524 // Ignored link types

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

525 if (preg_match('!^(javascript:|mailto:|#)!i', $link)) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

526 return $display;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

527 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

528

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

529 // skip links with href == content (#1490434)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

530 if ($link === $display) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

531 return $display;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

532 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

533

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

534 if (preg_match('!^([a-z][a-z0-9.+-]+:)!i', $link)) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

535 $url = $link;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

536 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

537 else {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

538 $url = $this->url;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

539 if (substr($link, 0, 1) != '/') {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

540 $url .= '/';

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

541 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

542 $url .= "$link";

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

543 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

544

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

545 if (($index = array_search($url, $this->_link_list)) === false) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

546 $index = count($this->_link_list);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

547 $this->_link_list[] = $url;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

548 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

549

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

550 return $display . ' [' . ($index+1) . ']';

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

551 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

552

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

553 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

554 * Helper function for PRE body conversion.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

555 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

556 * @param string &$text HTML content

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

557 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

558 protected function _convert_pre(&$text)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

559 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

560 // get the content of PRE element

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

561 while (preg_match('/<pre[^>]*>(.*)<\/pre>/ismU', $text, $matches)) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

562 $this->pre_content = $matches[1];

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

563

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

564 // Run our defined tags search-and-replace with callback

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

565 $this->pre_content = preg_replace_callback($this->callback_search,

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

566 array($this, 'tags_preg_callback'), $this->pre_content);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

567

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

568 // convert the content

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

569 $this->pre_content = sprintf('<div><br>%s<br></div>',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

570 preg_replace($this->pre_search, $this->pre_replace, $this->pre_content));

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

571

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

572 // replace the content (use callback because content can contain $0 variable)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

573 $text = preg_replace_callback('/<pre[^>]*>.*<\/pre>/ismU',

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

574 array($this, 'pre_preg_callback'), $text, 1);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

575

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

576 // free memory

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

577 $this->pre_content = '';

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

578 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

579 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

580

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

581 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

582 * Helper function for BLOCKQUOTE body conversion.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

583 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

584 * @param string &$text HTML content

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

585 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

586 protected function _convert_blockquotes(&$text)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

587 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

588 $level = 0;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

589 $offset = 0;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

590 while (($start = stripos($text, '<blockquote', $offset)) !== false) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

591 $offset = $start + 12;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

592 do {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

593 $end = stripos($text, '</blockquote>', $offset);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

594 $next = stripos($text, '<blockquote', $offset);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

595

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

596 // nested <blockquote>, skip

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

597 if ($next !== false && $next < $end) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

598 $offset = $next + 12;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

599 $level++;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

600 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

601 // nested </blockquote> tag

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

602 if ($end !== false && $level > 0) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

603 $offset = $end + 12;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

604 $level--;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

605 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

606 // found matching end tag

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

607 else if ($end !== false && $level == 0) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

608 $taglen = strpos($text, '>', $start) - $start;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

609 $startpos = $start + $taglen + 1;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

610

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

611 // get blockquote content

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

612 $body = trim(substr($text, $startpos, $end - $startpos));

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

613

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

614 // adjust text wrapping width

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

615 $p_width = $this->width;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

616 if ($this->width > 0) $this->width -= 2;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

617

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

618 // replace content with inner blockquotes

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

619 $this->_converter($body);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

620

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

621 // resore text width

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

622 $this->width = $p_width;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

623

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

624 // Add citation markers and create <pre> block

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

625 $body = preg_replace_callback('/((?:^|\n)>*)([^\n]*)/', array($this, 'blockquote_citation_callback'), trim($body));

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

626 $body = '<pre>' . htmlspecialchars($body) . '</pre>';

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

627

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

628 $text = substr_replace($text, $body . "\n", $start, $end + 13 - $start);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

629 $offset = 0;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

630

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

631 break;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

632 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

633 // abort on invalid tag structure (e.g. no closing tag found)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

634 else {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

635 break;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

636 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

637 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

638 while ($end || $next);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

639 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

640 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

641

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

642 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

643 * Callback function to correctly add citation markers for blockquote contents

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

644 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

645 public function blockquote_citation_callback($m)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

646 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

647 $line = ltrim($m[2]);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

648 $space = $line[0] == '>' ? '' : ' ';

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

649

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

650 return $m[1] . '>' . $space . $line;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

651 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

652

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

653 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

654 * Callback function for preg_replace_callback use.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

655 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

656 * @param array $matches PREG matches

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

657 * @return string

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

658 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

659 public function tags_preg_callback($matches)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

660 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

661 switch (strtolower($matches[1])) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

662 case 'b':

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

663 case 'strong':

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

664 return $this->_toupper($matches[3]);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

665 case 'th':

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

666 return $this->_toupper("\t\t". $matches[3] ."\n");

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

667 case 'h':

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

668 return $this->_toupper("\n\n". $matches[3] ."\n\n");

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

669 case 'a':

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

670 // Remove spaces in URL (#1487805)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

671 $url = str_replace(' ', '', $matches[3]);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

672 return $this->_build_link_list($url, $matches[4]);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

673 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

674 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

675

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

676 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

677 * Callback function for preg_replace_callback use in PRE content handler.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

678 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

679 * @param array $matches PREG matches

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

680 * @return string

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

681 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

682 public function pre_preg_callback($matches)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

683 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

684 return $this->pre_content;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

685 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

686

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

687 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

688 * Strtoupper function with HTML tags and entities handling.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

689 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

690 * @param string $str Text to convert

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

691 * @return string Converted text

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

692 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

693 private function _toupper($str)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

694 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

695 // string can containing HTML tags

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

696 $chunks = preg_split('/(<[^>]*>)/', $str, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

697

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

698 // convert toupper only the text between HTML tags

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

699 foreach ($chunks as $idx => $chunk) {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

700 if ($chunk[0] != '<') {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

701 $chunks[$idx] = $this->_strtoupper($chunk);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

702 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

703 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

704

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

705 return implode($chunks);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

706 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

707

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

708 /**

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

709 * Strtoupper multibyte wrapper function with HTML entities handling.

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

710 *

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

711 * @param string $str Text to convert

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

712 * @return string Converted text

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

713 */

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

714 private function _strtoupper($str)

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

715 {

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

716 $str = html_entity_decode($str, ENT_COMPAT, $this->charset);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

717 $str = mb_strtoupper($str);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

718 $str = htmlspecialchars($str, ENT_COMPAT, $this->charset);

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

719

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

720 return $str;

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

721 }

4681f974d28b vanilla 1.3.3 distro, I hope

Charlie Root

parents:

diff changeset

722 }

Mercurial > hg > rc2

annotate program/lib/Roundcube/rcube_html2text.php @ 0:4681f974d28b