/    Sign up×
Community /Pin to ProfileBookmark

Trying to read MS word content

Hi all,

I’m trying to read MS word contents using fread or file_get_contents …
It works fine using both.

But my problem is explained in the attached file.
[ATTACH]15549[/ATTACH]

I want to ignore non English characters because they are always converted into strange chars.

This is my code :

[code=php]
function parseWord($userDoc)
{
$fileHandle = fopen($userDoc, “r”);

$line =mb_convert_encoding( @fread($fileHandle, filesize($userDoc)) , “UTF-8”);

$lines = explode(chr(0x0D),$line);
$outtext = “”;
foreach($lines as $thisline)
{
$pos = strpos($thisline, chr(0x00));
if (($pos !== FALSE)||(strlen($thisline)==0))
{
} else {
$outtext .= $thisline.” “;
}
}
$outtext = preg_replace(“/[^a-zA-Z0-9s,.-nrt@/_()]/”,””,$outtext);
return $outtext;
}

[/code]

Can someone help?

[canned-message]attachments-removed-during-migration[/canned-message]

to post a comment
PHP

3 Comments(s)

Copy linkTweet thisAlerts:
@darroosh2authorMay 15.2013 — Any help?
Copy linkTweet thisAlerts:
@arronleeApr 21.2014 — [color="#333333"]PHP code? I have only tried to build[/color] [color="#333333"]Word readers[/color] [color="#333333"]with the help of some manual toolkits in C#.NET platform. You can google some PHP manual SDKs to help you. Remember to check its free trial package first if possible. I hope you success. Good luck.







Best regards,

Arron[/color]
Copy linkTweet thisAlerts:
@Error404Apr 21.2014 — Have you tried anything like PHPDocX or used PHP's built-in COM classes? If you need to read the content directly into a database, such as Windows SQL Server or SQL Express, you could try to create a linked server that reads from that file or simply import it (depending on how you want it to function). All of these options should be able to handle non-English characters.
×

Success!

Help @darroosh2 spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 5.18,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...