/    Sign up×
Community /Pin to ProfileBookmark

Weird Characters <?> and Â

I am working on re-launching a website, and I am having some data display issues that Im not sure if its CSS problems, general UTF character issues or other.
————————————————————————–


[B]Here is what the live site looks like ([COLOR=”Blue”]good[/COLOR]):[/B]
————————————————————————–

[url]http://www.webdeveloper.com/forum/attachment.php?attachmentid=11937&stc=1&d=1228773724[/url]


————————————————————————–

[B]Here is what my new site looks like, with identical data ([COLOR=”Red”]bad[/COLOR]):[/B]
————————————————————————–


[url]http://www.webdeveloper.com/forum/attachment.php?attachmentid=11936&stc=1&d=1228773724[/url]
————————————————————————–

[B]Here is what the data looks like[/B]

[code=php]<p class=”MsoNormal” ><b><span><o:p>Â </o:p></span></b></p> <p class=”MsoNormal” style=”margin-left: 0.25in; text-indent: -0.25in;”><span>i.<span>Â Â Â </span>Types of Heart Valve Disorders<o:p></o:p></span></p> <p class=”MsoNormal” style=”text-indent: 0.25in;”><span>a. Valvular Stenosis<o:p></o:p></span></p> <p class=”MsoNormal” style=”text-indent: 0.25in;”><span>b. Valvular Regurgitation<o:p></o:p></span></p> <p class=”MsoNormal” style=”text-indent: 0.25in;”><span>c. Valve Prolapse<o:p></o:p></span></p> <p class=”MsoNormal” style=”margin-left: 0.25in; text-indent: -0.25in;”><span>ii.<span>Â Â Â </span>Heart Valve Disorder Treatment Strategies<o:p></o:p></span></p> <p class=”MsoNormal” style=”margin-left: 0.25in; text-indent: -0.25in;”><span>iii.<span>Â Â </span>Heart Valve Repairs and Replacements, Combined Procedure Volumes Forecast<o:p></o:p></span></p> <p class=”MsoNormal” style=”text-indent: 0.25in;”><span>a. Annuloplasty Device Implantations<o:p></o:p></span></p> <p class=”MsoNormal” style=”text-indent: 0.25in;”><span>b. Mechanical Heart Valve Implantations<o:p></o:p></span></p> <p class=”MsoNormal” style=”text-indent: 0.25in;”><span>c. Tissue Valve Implantations<o:p></o:p></span></p> [/code]

[B][COLOR=”Blue”]
What would make it display differently in one instance, and different in another with the same data??[/COLOR]
[/B]

to post a comment
PHP

7 Comments(s)

Copy linkTweet thisAlerts:
@Stephen_PhilbinDec 08.2008 — My first guess would be the character encoding being different for each site, but I can't check that from those images. I would need to view the actual page in my web browser to check.
Copy linkTweet thisAlerts:
@MindzaiDec 08.2008 — Also is there any reason you are using <p> instead of <ol> (which is designed for exactly what you are doing here)?
Copy linkTweet thisAlerts:
@Stephen_PhilbinDec 08.2008 — And why do you have <o:p/> elements? Are you using mixed namespaces in XML?
Copy linkTweet thisAlerts:
@ripcurlksmauthorDec 08.2008 — <p> and <o:p/> are being used because our source who gave it to us let us copy and paste it from their website and there are so many indents, it is easier to copy and paste than go through 200 pages of 300-line outlines.

Back to the issue of getting the "new" site to match the old results with the character encoding...

I am noticing that on the old "working" page I have no encoding at all
[code=html]<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<meta http-equiv="Content-Script-Type" content="text/javascript">
</head>[/code]

And on the new site I have:
[code=html]<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml2/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>[/code]


However, If i try and remove the encoding from the new site, to match the old... I get the same result... I will try and post a live example
Copy linkTweet thisAlerts:
@Stephen_PhilbinDec 09.2008 — I was only asking about the <o:p/> elements because they don't belong in standard HTML. I was just trying to eliminate it as a possible cause of your problem. I wasn't calling you stupid or anything. If you want to fix them with almost no effort, there's plenty of tools you can use that will find and replace repeating errors for you in around a second or so. Personally, I use the search and replace feature of one of my favourite IDEs: [url=http://jedit.org/]jEdit[/url].

Anyway. I had a look at the sample markup you posted and based on what you've posted it looks like the new page is actually being served as ISO-8859-1 or something similair. The markup from both pages specifies that the character encoding is UTF-8 (I think you got mixed up between character encoding and a DOCTYPE declaration).

Is the new version of your page hosted on a different host or served by a different http server? If so, then it's probably simply a case of configuring the http server that is serving your new version to serve pages as UTF-8 by default or to use PHP to send a header specifying UTF-8 on a page-by-page basis. The reason you'd need to do this is because specifying the encoding in an HTML meta tag is only a fallback incase no encoding is specified by the http server. If the http server does specify an encoding then it takes presidence over any other encoding declaration.

So my guess is your meta tag is saying UTF-8, but your http server is saying ISO-8859-1 (or something very similair).

You might also run into problems is you're copying the content of another site by copying the page content as rendered by your browser, rather than viewing and copying the source HTML.
Copy linkTweet thisAlerts:
@ripcurlksmauthorDec 09.2008 — Thanks for the clarification Stephen, no worries ? You have been greatly helpful.

Per your advice, I am setting up a few test pages to manually set the encoding. Both of these encoding sets still output different 'bad' characters. I suppose the best result I would want is for a 'blank space' or 'nothing' to appear in the place of the weird characters.

test-utf.php
[code=php]<?php
header('Content-Type: text/html; charset=UTF-8');

echo '<p class="MsoNormal" ><b><span><o:p>Â </o:p></span></b></p> <p class="MsoNormal" style="margin-left: 0.25in; text-indent: -0.25in;"><span>i.<span>Â Â Â </span>Types of Heart Valve Disorders<o:p></o:p></span></p> <p class="MsoNormal" style="text-indent: 0.25in;"><span>a. Valvular Stenosis<o:p></o:p></span></p> <p class="MsoNormal" style="text-indent: 0.25in;"><span>b. Valvular Regurgitation<o:p></o:p></span></p> <p class="MsoNormal" style="text-indent: 0.25in;"><span>c. Valve Prolapse<o:p></o:p></span></p> <p class="MsoNormal" style="margin-left: 0.25in; text-indent: -0.25in;"><span>ii.<span>Â Â Â </span>Heart Valve Disorder Treatment Strategies<o:p></o:p></span></p> <p class="MsoNormal" style="margin-left: 0.25in; text-indent: -0.25in;"><span>iii.<span>Â Â </span>Heart Valve Repairs and Replacements, Combined Procedure Volumes Forecast<o:p></o:p></span></p> <p class="MsoNormal" style="text-indent: 0.25in;"><span>a. Annuloplasty Device Implantations<o:p></o:p></span></p> <p class="MsoNormal" style="text-indent: 0.25in;"><span>b. Mechanical Heart Valve Implantations<o:p></o:p></span></p> <p class="MsoNormal" style="text-indent: 0.25in;"><span>c. Tissue Valve Implantations<o:p></o:p></span></p> ';
?>[/code]


test-iso.php
[code=php]<?php
header('Content-Type: text/html; charset=ISO-8859-1');

echo '<p class="MsoNormal" ><b><span><o:p>Â </o:p></span></b></p> <p class="MsoNormal" style="margin-left: 0.25in; text-indent: -0.25in;"><span>i.<span>Â Â Â </span>Types of Heart Valve Disorders<o:p></o:p></span></p> <p class="MsoNormal" style="text-indent: 0.25in;"><span>a. Valvular Stenosis<o:p></o:p></span></p> <p class="MsoNormal" style="text-indent: 0.25in;"><span>b. Valvular Regurgitation<o:p></o:p></span></p> <p class="MsoNormal" style="text-indent: 0.25in;"><span>c. Valve Prolapse<o:p></o:p></span></p> <p class="MsoNormal" style="margin-left: 0.25in; text-indent: -0.25in;"><span>ii.<span>Â Â Â </span>Heart Valve Disorder Treatment Strategies<o:p></o:p></span></p> <p class="MsoNormal" style="margin-left: 0.25in; text-indent: -0.25in;"><span>iii.<span>Â Â </span>Heart Valve Repairs and Replacements, Combined Procedure Volumes Forecast<o:p></o:p></span></p> <p class="MsoNormal" style="text-indent: 0.25in;"><span>a. Annuloplasty Device Implantations<o:p></o:p></span></p> <p class="MsoNormal" style="text-indent: 0.25in;"><span>b. Mechanical Heart Valve Implantations<o:p></o:p></span></p> <p class="MsoNormal" style="text-indent: 0.25in;"><span>c. Tissue Valve Implantations<o:p></o:p></span></p> ';
?>[/code]
Copy linkTweet thisAlerts:
@Stephen_PhilbinDec 10.2008 — There's a good chance that your tests that you've posted will be self-defeating. The reason that they will probably display the same characters on the rendered page as the ones you pasted in is simply because you pasted them in. It's likely your IDE is taking your input and re-encoding it according to its settings.

As far as I can see (based on the incorrect characters you are getting displayed on the screen and the look of the "working" page) your initial problem is that your page is [i]supposed[/i] to be displaying a non-breaking-space where you are getting the incorrect characters. It matches because the UTF-8 form of a non-breaking-space character uses two bytes: "C2" and "A0". If you take these two bytes and display them with the ISO-8859-1 encoding then the "C2" byte will be displayed as the "&#194;" character and the "A0" byte will be displayed as a non-breaking-space character.

A simple fix would be to use an entity reference instead. Either "[b]&nbsp;[/b]", "[b]&#xA0;[/b]", or "[b]&#38;#160;[/b]" should do the job (regardless of whether the page is served as UTF-8 or ISO-8859-1). The best fix would be to simply use better, more semantic markup which would completely negate the need for the non-breaking-spaces and make everything much simpler, but I'm guessing that's not really an option because you get it from an external source.

How are you obtaining the content for the sites anyway? Do the sites use a PHP script to automatically download the content from the source or do you download the content manually? Or perhaps is one automated and the other done manually?
×

Success!

Help @ripcurlksm spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 5.18,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...