Hi all,
I’m trying to read MS word contents using fread or file_get_contents …
It works fine using both.
But my problem is explained in the attached file.
[ATTACH]15549[/ATTACH]
I want to ignore non English characters because they are always converted into strange chars.
This is my code :
[code=php]
function parseWord($userDoc)
{
$fileHandle = fopen($userDoc, “r”);
$line =mb_convert_encoding( @fread($fileHandle, filesize($userDoc)) , “UTF-8”);
$lines = explode(chr(0x0D),$line);
$outtext = “”;
foreach($lines as $thisline)
{
$pos = strpos($thisline, chr(0x00));
if (($pos !== FALSE)||(strlen($thisline)==0))
{
} else {
$outtext .= $thisline.” “;
}
}
$outtext = preg_replace(“/[^a-zA-Z0-9s,.-nrt@/_()]/”,””,$outtext);
return $outtext;
}
Can someone help?
[canned-message]attachments-removed-during-migration