annotate program/lib/Roundcube/rcube_html2text.php @ 0:4681f974d28b

vanilla 1.3.3 distro, I hope
author Charlie Root
date Thu, 04 Jan 2018 15:52:31 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
1 <?php
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
2
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
3 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
4 +-----------------------------------------------------------------------+
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
5 | This file is part of the Roundcube Webmail client |
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
6 | Copyright (C) 2008-2012, The Roundcube Dev Team |
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
7 | Copyright (c) 2005-2007, Jon Abernathy <jon@chuggnutt.com> |
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
8 | |
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
9 | Licensed under the GNU General Public License version 3 or |
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
10 | any later version with exceptions for skins & plugins. |
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
11 | See the README file for a full license statement. |
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
12 | |
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
13 | PURPOSE: |
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
14 | Converts HTML to formatted plain text (based on html2text class) |
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
15 +-----------------------------------------------------------------------+
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
16 | Author: Thomas Bruederli <roundcube@gmail.com> |
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
17 | Author: Aleksander Machniak <alec@alec.pl> |
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
18 | Author: Jon Abernathy <jon@chuggnutt.com> |
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
19 +-----------------------------------------------------------------------+
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
20 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
21
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
22 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
23 * Takes HTML and converts it to formatted, plain text.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
24 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
25 * Thanks to Alexander Krug (http://www.krugar.de/) to pointing out and
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
26 * correcting an error in the regexp search array. Fixed 7/30/03.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
27 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
28 * Updated set_html() function's file reading mechanism, 9/25/03.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
29 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
30 * Thanks to Joss Sanglier (http://www.dancingbear.co.uk/) for adding
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
31 * several more HTML entity codes to the $search and $replace arrays.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
32 * Updated 11/7/03.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
33 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
34 * Thanks to Darius Kasperavicius (http://www.dar.dar.lt/) for
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
35 * suggesting the addition of $allowed_tags and its supporting function
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
36 * (which I slightly modified). Updated 3/12/04.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
37 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
38 * Thanks to Justin Dearing for pointing out that a replacement for the
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
39 * <TH> tag was missing, and suggesting an appropriate fix.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
40 * Updated 8/25/04.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
41 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
42 * Thanks to Mathieu Collas (http://www.myefarm.com/) for finding a
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
43 * display/formatting bug in the _build_link_list() function: email
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
44 * readers would show the left bracket and number ("[1") as part of the
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
45 * rendered email address.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
46 * Updated 12/16/04.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
47 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
48 * Thanks to Wojciech Bajon (http://histeria.pl/) for submitting code
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
49 * to handle relative links, which I hadn't considered. I modified his
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
50 * code a bit to handle normal HTTP links and MAILTO links. Also for
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
51 * suggesting three additional HTML entity codes to search for.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
52 * Updated 03/02/05.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
53 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
54 * Thanks to Jacob Chandler for pointing out another link condition
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
55 * for the _build_link_list() function: "https".
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
56 * Updated 04/06/05.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
57 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
58 * Thanks to Marc Bertrand (http://www.dresdensky.com/) for
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
59 * suggesting a revision to the word wrapping functionality; if you
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
60 * specify a $width of 0 or less, word wrapping will be ignored.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
61 * Updated 11/02/06.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
62 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
63 * *** Big housecleaning updates below:
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
64 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
65 * Thanks to Colin Brown (http://www.sparkdriver.co.uk/) for
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
66 * suggesting the fix to handle </li> and blank lines (whitespace).
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
67 * Christian Basedau (http://www.movetheweb.de/) also suggested the
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
68 * blank lines fix.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
69 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
70 * Special thanks to Marcus Bointon (http://www.synchromedia.co.uk/),
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
71 * Christian Basedau, Norbert Laposa (http://ln5.co.uk/),
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
72 * Bas van de Weijer, and Marijn van Butselaar
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
73 * for pointing out my glaring error in the <th> handling. Marcus also
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
74 * supplied a host of fixes.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
75 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
76 * Thanks to Jeffrey Silverman (http://www.newtnotes.com/) for pointing
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
77 * out that extra spaces should be compressed--a problem addressed with
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
78 * Marcus Bointon's fixes but that I had not yet incorporated.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
79 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
80 * Thanks to Daniel Schledermann (http://www.typoconsult.dk/) for
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
81 * suggesting a valuable fix with <a> tag handling.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
82 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
83 * Thanks to Wojciech Bajon (again!) for suggesting fixes and additions,
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
84 * including the <a> tag handling that Daniel Schledermann pointed
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
85 * out but that I had not yet incorporated. I haven't (yet)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
86 * incorporated all of Wojciech's changes, though I may at some
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
87 * future time.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
88 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
89 * *** End of the housecleaning updates. Updated 08/08/07.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
90 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
91
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
92 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
93 * Converts HTML to formatted plain text
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
94 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
95 * @package Framework
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
96 * @subpackage Utils
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
97 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
98 class rcube_html2text
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
99 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
100 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
101 * Contains the HTML content to convert.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
102 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
103 * @var string $html
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
104 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
105 protected $html;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
106
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
107 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
108 * Contains the converted, formatted text.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
109 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
110 * @var string $text
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
111 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
112 protected $text;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
113
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
114 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
115 * Maximum width of the formatted text, in columns.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
116 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
117 * Set this value to 0 (or less) to ignore word wrapping
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
118 * and not constrain text to a fixed-width column.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
119 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
120 * @var integer $width
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
121 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
122 protected $width = 70;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
123
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
124 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
125 * Target character encoding for output text
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
126 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
127 * @var string $charset
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
128 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
129 protected $charset = 'UTF-8';
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
130
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
131 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
132 * List of preg* regular expression patterns to search for,
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
133 * used in conjunction with $replace.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
134 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
135 * @var array $search
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
136 * @see $replace
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
137 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
138 protected $search = array(
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
139 '/\r/', // Non-legal carriage return
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
140 '/^.*<body[^>]*>\n*/is', // Anything before <body>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
141 '/<head[^>]*>.*?<\/head>/is', // <head>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
142 '/<script[^>]*>.*?<\/script>/is', // <script>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
143 '/<style[^>]*>.*?<\/style>/is', // <style>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
144 '/[\n\t]+/', // Newlines and tabs
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
145 '/<p[^>]*>/i', // <p>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
146 '/<\/p>[\s\n\t]*<div[^>]*>/i', // </p> before <div>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
147 '/<br[^>]*>[\s\n\t]*<div[^>]*>/i', // <br> before <div>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
148 '/<br[^>]*>\s*/i', // <br>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
149 '/<i[^>]*>(.*?)<\/i>/i', // <i>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
150 '/<em[^>]*>(.*?)<\/em>/i', // <em>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
151 '/(<ul[^>]*>|<\/ul>)/i', // <ul> and </ul>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
152 '/(<ol[^>]*>|<\/ol>)/i', // <ol> and </ol>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
153 '/<li[^>]*>(.*?)<\/li>/i', // <li> and </li>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
154 '/<li[^>]*>/i', // <li>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
155 '/<hr[^>]*>/i', // <hr>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
156 '/<div[^>]*>/i', // <div>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
157 '/(<table[^>]*>|<\/table>)/i', // <table> and </table>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
158 '/(<tr[^>]*>|<\/tr>)/i', // <tr> and </tr>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
159 '/<td[^>]*>(.*?)<\/td>/i', // <td> and </td>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
160 );
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
161
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
162 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
163 * List of pattern replacements corresponding to patterns searched.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
164 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
165 * @var array $replace
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
166 * @see $search
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
167 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
168 protected $replace = array(
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
169 '', // Non-legal carriage return
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
170 '', // Anything before <body>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
171 '', // <head>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
172 '', // <script>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
173 '', // <style>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
174 ' ', // Newlines and tabs
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
175 "\n\n", // <p>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
176 "\n<div>", // </p> before <div>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
177 '<div>', // <br> before <div>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
178 "\n", // <br>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
179 '_\\1_', // <i>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
180 '_\\1_', // <em>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
181 "\n\n", // <ul> and </ul>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
182 "\n\n", // <ol> and </ol>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
183 "\t* \\1\n", // <li> and </li>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
184 "\n\t* ", // <li>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
185 "\n-------------------------\n", // <hr>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
186 "<div>\n", // <div>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
187 "\n\n", // <table> and </table>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
188 "\n", // <tr> and </tr>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
189 "\t\t\\1\n", // <td> and </td>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
190 );
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
191
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
192 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
193 * List of preg* regular expression patterns to search for,
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
194 * used in conjunction with $ent_replace.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
195 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
196 * @var array $ent_search
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
197 * @see $ent_replace
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
198 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
199 protected $ent_search = array(
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
200 '/&(nbsp|#160);/i', // Non-breaking space
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
201 '/&(quot|rdquo|ldquo|#8220|#8221|#147|#148);/i',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
202 // Double quotes
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
203 '/&(apos|rsquo|lsquo|#8216|#8217);/i', // Single quotes
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
204 '/&gt;/i', // Greater-than
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
205 '/&lt;/i', // Less-than
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
206 '/&(copy|#169);/i', // Copyright
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
207 '/&(trade|#8482|#153);/i', // Trademark
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
208 '/&(reg|#174);/i', // Registered
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
209 '/&(mdash|#151|#8212);/i', // mdash
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
210 '/&(ndash|minus|#8211|#8722);/i', // ndash
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
211 '/&(bull|#149|#8226);/i', // Bullet
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
212 '/&(pound|#163);/i', // Pound sign
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
213 '/&(euro|#8364);/i', // Euro sign
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
214 '/&(amp|#38);/i', // Ampersand: see _converter()
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
215 '/[ ]{2,}/', // Runs of spaces, post-handling
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
216 );
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
217
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
218 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
219 * List of pattern replacements corresponding to patterns searched.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
220 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
221 * @var array $ent_replace
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
222 * @see $ent_search
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
223 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
224 protected $ent_replace = array(
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
225 "\xC2\xA0", // Non-breaking space
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
226 '"', // Double quotes
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
227 "'", // Single quotes
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
228 '>',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
229 '<',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
230 '(c)',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
231 '(tm)',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
232 '(R)',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
233 '--',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
234 '-',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
235 '*',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
236 '£',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
237 'EUR', // Euro sign. €
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
238 '|+|amp|+|', // Ampersand: see _converter()
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
239 ' ', // Runs of spaces, post-handling
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
240 );
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
241
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
242 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
243 * List of preg* regular expression patterns to search for
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
244 * and replace using callback function.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
245 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
246 * @var array $callback_search
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
247 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
248 protected $callback_search = array(
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
249 '/<(a) [^>]*href=("|\')([^"\']+)\2[^>]*>(.*?)<\/a>/i', // <a href="">
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
250 '/<(h)[123456]( [^>]*)?>(.*?)<\/h[123456]>/i', // h1 - h6
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
251 '/<(b)( [^>]*)?>(.*?)<\/b>/i', // <b>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
252 '/<(strong)( [^>]*)?>(.*?)<\/strong>/i', // <strong>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
253 '/<(th)( [^>]*)?>(.*?)<\/th>/i', // <th> and </th>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
254 );
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
255
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
256 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
257 * List of preg* regular expression patterns to search for in PRE body,
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
258 * used in conjunction with $pre_replace.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
259 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
260 * @var array $pre_search
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
261 * @see $pre_replace
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
262 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
263 protected $pre_search = array(
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
264 "/\n/",
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
265 "/\t/",
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
266 '/ /',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
267 '/<pre[^>]*>/',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
268 '/<\/pre>/'
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
269 );
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
270
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
271 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
272 * List of pattern replacements corresponding to patterns searched for PRE body.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
273 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
274 * @var array $pre_replace
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
275 * @see $pre_search
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
276 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
277 protected $pre_replace = array(
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
278 '<br>',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
279 '&nbsp;&nbsp;&nbsp;&nbsp;',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
280 '&nbsp;',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
281 '',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
282 ''
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
283 );
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
284
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
285 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
286 * Contains a list of HTML tags to allow in the resulting text.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
287 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
288 * @var string $allowed_tags
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
289 * @see set_allowed_tags()
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
290 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
291 protected $allowed_tags = '';
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
292
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
293 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
294 * Contains the base URL that relative links should resolve to.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
295 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
296 * @var string $url
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
297 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
298 protected $url;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
299
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
300 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
301 * Indicates whether content in the $html variable has been converted yet.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
302 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
303 * @var boolean $_converted
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
304 * @see $html, $text
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
305 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
306 protected $_converted = false;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
307
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
308 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
309 * Contains URL addresses from links to be rendered in plain text.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
310 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
311 * @var array $_link_list
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
312 * @see _build_link_list()
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
313 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
314 protected $_link_list = array();
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
315
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
316 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
317 * Boolean flag, true if a table of link URLs should be listed after the text.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
318 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
319 * @var boolean $_do_links
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
320 * @see __construct()
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
321 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
322 protected $_do_links = true;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
323
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
324 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
325 * Constructor.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
326 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
327 * If the HTML source string (or file) is supplied, the class
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
328 * will instantiate with that source propagated, all that has
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
329 * to be done it to call get_text().
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
330 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
331 * @param string $source HTML content
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
332 * @param boolean $from_file Indicates $source is a file to pull content from
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
333 * @param boolean $do_links Indicate whether a table of link URLs is desired
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
334 * @param integer $width Maximum width of the formatted text, 0 for no limit
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
335 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
336 function __construct($source = '', $from_file = false, $do_links = true, $width = 75, $charset = 'UTF-8')
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
337 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
338 if (!empty($source)) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
339 $this->set_html($source, $from_file);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
340 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
341
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
342 $this->set_base_url();
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
343
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
344 $this->_do_links = $do_links;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
345 $this->width = $width;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
346 $this->charset = $charset;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
347 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
348
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
349 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
350 * Loads source HTML into memory, either from $source string or a file.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
351 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
352 * @param string $source HTML content
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
353 * @param boolean $from_file Indicates $source is a file to pull content from
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
354 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
355 function set_html($source, $from_file = false)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
356 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
357 if ($from_file && file_exists($source)) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
358 $this->html = file_get_contents($source);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
359 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
360 else {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
361 $this->html = $source;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
362 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
363
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
364 $this->_converted = false;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
365 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
366
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
367 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
368 * Returns the text, converted from HTML.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
369 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
370 * @return string Plain text
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
371 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
372 function get_text()
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
373 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
374 if (!$this->_converted) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
375 $this->_convert();
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
376 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
377
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
378 return $this->text;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
379 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
380
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
381 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
382 * Prints the text, converted from HTML.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
383 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
384 function print_text()
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
385 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
386 print $this->get_text();
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
387 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
388
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
389 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
390 * Sets the allowed HTML tags to pass through to the resulting text.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
391 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
392 * Tags should be in the form "<p>", with no corresponding closing tag.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
393 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
394 function set_allowed_tags($allowed_tags = '')
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
395 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
396 if (!empty($allowed_tags)) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
397 $this->allowed_tags = $allowed_tags;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
398 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
399 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
400
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
401 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
402 * Sets a base URL to handle relative links.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
403 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
404 function set_base_url($url = '')
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
405 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
406 if (empty($url)) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
407 if (!empty($_SERVER['HTTP_HOST'])) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
408 $this->url = 'http://' . $_SERVER['HTTP_HOST'];
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
409 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
410 else {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
411 $this->url = '';
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
412 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
413 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
414 else {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
415 // Strip any trailing slashes for consistency (relative
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
416 // URLs may already start with a slash like "/file.html")
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
417 if (substr($url, -1) == '/') {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
418 $url = substr($url, 0, -1);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
419 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
420 $this->url = $url;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
421 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
422 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
423
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
424 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
425 * Workhorse function that does actual conversion (calls _converter() method).
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
426 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
427 protected function _convert()
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
428 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
429 // Variables used for building the link list
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
430 $this->_link_list = array();
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
431
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
432 $text = $this->html;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
433
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
434 // Convert HTML to TXT
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
435 $this->_converter($text);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
436
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
437 // Add link list
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
438 if (!empty($this->_link_list)) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
439 $text .= "\n\nLinks:\n------\n";
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
440 foreach ($this->_link_list as $idx => $url) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
441 $text .= '[' . ($idx+1) . '] ' . $url . "\n";
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
442 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
443 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
444
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
445 $this->text = $text;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
446 $this->_converted = true;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
447 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
448
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
449 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
450 * Workhorse function that does actual conversion.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
451 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
452 * First performs custom tag replacement specified by $search and
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
453 * $replace arrays. Then strips any remaining HTML tags, reduces whitespace
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
454 * and newlines to a readable format, and word wraps the text to
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
455 * $width characters.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
456 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
457 * @param string &$text Reference to HTML content string
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
458 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
459 protected function _converter(&$text)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
460 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
461 // Convert <BLOCKQUOTE> (before PRE!)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
462 $this->_convert_blockquotes($text);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
463
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
464 // Convert <PRE>
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
465 $this->_convert_pre($text);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
466
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
467 // Run our defined tags search-and-replace
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
468 $text = preg_replace($this->search, $this->replace, $text);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
469
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
470 // Run our defined tags search-and-replace with callback
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
471 $text = preg_replace_callback($this->callback_search, array($this, 'tags_preg_callback'), $text);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
472
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
473 // Strip any other HTML tags
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
474 $text = strip_tags($text, $this->allowed_tags);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
475
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
476 // Run our defined entities/characters search-and-replace
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
477 $text = preg_replace($this->ent_search, $this->ent_replace, $text);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
478
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
479 // Replace known html entities
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
480 $text = html_entity_decode($text, ENT_QUOTES, $this->charset);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
481
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
482 // Replace unicode nbsp to regular spaces
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
483 $text = preg_replace('/\xC2\xA0/', ' ', $text);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
484
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
485 // Remove unknown/unhandled entities (this cannot be done in search-and-replace block)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
486 $text = preg_replace('/&([a-zA-Z0-9]{2,6}|#[0-9]{2,4});/', '', $text);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
487
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
488 // Convert "|+|amp|+|" into "&", need to be done after handling of unknown entities
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
489 // This properly handles situation of "&amp;quot;" in input string
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
490 $text = str_replace('|+|amp|+|', '&', $text);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
491
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
492 // Bring down number of empty lines to 2 max
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
493 $text = preg_replace("/\n\s+\n/", "\n\n", $text);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
494 $text = preg_replace("/[\n]{3,}/", "\n\n", $text);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
495
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
496 // remove leading empty lines (can be produced by eg. P tag on the beginning)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
497 $text = ltrim($text, "\n");
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
498
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
499 // Wrap the text to a readable format
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
500 // for PHP versions >= 4.0.2. Default width is 75
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
501 // If width is 0 or less, don't wrap the text.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
502 if ( $this->width > 0 ) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
503 $text = wordwrap($text, $this->width);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
504 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
505 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
506
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
507 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
508 * Helper function called by preg_replace() on link replacement.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
509 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
510 * Maintains an internal list of links to be displayed at the end of the
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
511 * text, with numeric indices to the original point in the text they
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
512 * appeared. Also makes an effort at identifying and handling absolute
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
513 * and relative links.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
514 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
515 * @param string $link URL of the link
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
516 * @param string $display Part of the text to associate number with
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
517 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
518 protected function _build_link_list($link, $display)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
519 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
520 if (!$this->_do_links || empty($link)) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
521 return $display;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
522 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
523
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
524 // Ignored link types
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
525 if (preg_match('!^(javascript:|mailto:|#)!i', $link)) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
526 return $display;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
527 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
528
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
529 // skip links with href == content (#1490434)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
530 if ($link === $display) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
531 return $display;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
532 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
533
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
534 if (preg_match('!^([a-z][a-z0-9.+-]+:)!i', $link)) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
535 $url = $link;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
536 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
537 else {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
538 $url = $this->url;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
539 if (substr($link, 0, 1) != '/') {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
540 $url .= '/';
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
541 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
542 $url .= "$link";
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
543 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
544
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
545 if (($index = array_search($url, $this->_link_list)) === false) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
546 $index = count($this->_link_list);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
547 $this->_link_list[] = $url;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
548 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
549
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
550 return $display . ' [' . ($index+1) . ']';
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
551 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
552
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
553 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
554 * Helper function for PRE body conversion.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
555 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
556 * @param string &$text HTML content
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
557 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
558 protected function _convert_pre(&$text)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
559 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
560 // get the content of PRE element
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
561 while (preg_match('/<pre[^>]*>(.*)<\/pre>/ismU', $text, $matches)) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
562 $this->pre_content = $matches[1];
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
563
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
564 // Run our defined tags search-and-replace with callback
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
565 $this->pre_content = preg_replace_callback($this->callback_search,
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
566 array($this, 'tags_preg_callback'), $this->pre_content);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
567
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
568 // convert the content
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
569 $this->pre_content = sprintf('<div><br>%s<br></div>',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
570 preg_replace($this->pre_search, $this->pre_replace, $this->pre_content));
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
571
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
572 // replace the content (use callback because content can contain $0 variable)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
573 $text = preg_replace_callback('/<pre[^>]*>.*<\/pre>/ismU',
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
574 array($this, 'pre_preg_callback'), $text, 1);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
575
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
576 // free memory
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
577 $this->pre_content = '';
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
578 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
579 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
580
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
581 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
582 * Helper function for BLOCKQUOTE body conversion.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
583 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
584 * @param string &$text HTML content
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
585 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
586 protected function _convert_blockquotes(&$text)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
587 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
588 $level = 0;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
589 $offset = 0;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
590 while (($start = stripos($text, '<blockquote', $offset)) !== false) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
591 $offset = $start + 12;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
592 do {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
593 $end = stripos($text, '</blockquote>', $offset);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
594 $next = stripos($text, '<blockquote', $offset);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
595
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
596 // nested <blockquote>, skip
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
597 if ($next !== false && $next < $end) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
598 $offset = $next + 12;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
599 $level++;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
600 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
601 // nested </blockquote> tag
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
602 if ($end !== false && $level > 0) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
603 $offset = $end + 12;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
604 $level--;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
605 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
606 // found matching end tag
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
607 else if ($end !== false && $level == 0) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
608 $taglen = strpos($text, '>', $start) - $start;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
609 $startpos = $start + $taglen + 1;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
610
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
611 // get blockquote content
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
612 $body = trim(substr($text, $startpos, $end - $startpos));
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
613
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
614 // adjust text wrapping width
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
615 $p_width = $this->width;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
616 if ($this->width > 0) $this->width -= 2;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
617
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
618 // replace content with inner blockquotes
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
619 $this->_converter($body);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
620
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
621 // resore text width
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
622 $this->width = $p_width;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
623
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
624 // Add citation markers and create <pre> block
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
625 $body = preg_replace_callback('/((?:^|\n)>*)([^\n]*)/', array($this, 'blockquote_citation_callback'), trim($body));
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
626 $body = '<pre>' . htmlspecialchars($body) . '</pre>';
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
627
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
628 $text = substr_replace($text, $body . "\n", $start, $end + 13 - $start);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
629 $offset = 0;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
630
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
631 break;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
632 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
633 // abort on invalid tag structure (e.g. no closing tag found)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
634 else {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
635 break;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
636 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
637 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
638 while ($end || $next);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
639 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
640 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
641
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
642 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
643 * Callback function to correctly add citation markers for blockquote contents
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
644 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
645 public function blockquote_citation_callback($m)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
646 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
647 $line = ltrim($m[2]);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
648 $space = $line[0] == '>' ? '' : ' ';
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
649
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
650 return $m[1] . '>' . $space . $line;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
651 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
652
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
653 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
654 * Callback function for preg_replace_callback use.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
655 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
656 * @param array $matches PREG matches
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
657 * @return string
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
658 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
659 public function tags_preg_callback($matches)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
660 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
661 switch (strtolower($matches[1])) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
662 case 'b':
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
663 case 'strong':
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
664 return $this->_toupper($matches[3]);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
665 case 'th':
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
666 return $this->_toupper("\t\t". $matches[3] ."\n");
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
667 case 'h':
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
668 return $this->_toupper("\n\n". $matches[3] ."\n\n");
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
669 case 'a':
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
670 // Remove spaces in URL (#1487805)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
671 $url = str_replace(' ', '', $matches[3]);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
672 return $this->_build_link_list($url, $matches[4]);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
673 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
674 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
675
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
676 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
677 * Callback function for preg_replace_callback use in PRE content handler.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
678 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
679 * @param array $matches PREG matches
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
680 * @return string
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
681 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
682 public function pre_preg_callback($matches)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
683 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
684 return $this->pre_content;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
685 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
686
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
687 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
688 * Strtoupper function with HTML tags and entities handling.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
689 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
690 * @param string $str Text to convert
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
691 * @return string Converted text
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
692 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
693 private function _toupper($str)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
694 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
695 // string can containing HTML tags
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
696 $chunks = preg_split('/(<[^>]*>)/', $str, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
697
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
698 // convert toupper only the text between HTML tags
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
699 foreach ($chunks as $idx => $chunk) {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
700 if ($chunk[0] != '<') {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
701 $chunks[$idx] = $this->_strtoupper($chunk);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
702 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
703 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
704
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
705 return implode($chunks);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
706 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
707
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
708 /**
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
709 * Strtoupper multibyte wrapper function with HTML entities handling.
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
710 *
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
711 * @param string $str Text to convert
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
712 * @return string Converted text
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
713 */
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
714 private function _strtoupper($str)
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
715 {
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
716 $str = html_entity_decode($str, ENT_COMPAT, $this->charset);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
717 $str = mb_strtoupper($str);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
718 $str = htmlspecialchars($str, ENT_COMPAT, $this->charset);
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
719
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
720 return $str;
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
721 }
4681f974d28b vanilla 1.3.3 distro, I hope
Charlie Root
parents:
diff changeset
722 }