Python non ascii character

My manuscript supplies strings that sometimes include foreign characters that don't exist in ASCII. (ö, ê, and so on.) How do I gain them to work in Python? For example, the character 'µ' gets readjusted to 'xb5', which renders my manuscript fail. Is tbelow any type of means to make this work?

I read somewbelow that Python 3 alters the way it handles encodings, need to I attempt to upgrade instead?


*

You need to transform your message in to unicode. If you are analysis those strings from files, then possibilities are they are in utf-8, so the following must work:

my_unicode = my_string.decode('utf-8')Although it depends on the format you are trying to read.

You watching: Python non ascii character

Definitely read this if you haven't already:

http://docs.python.org/2/howto/unicode.html

Python 2.6 deserve to occupational with unicode well enough, yet Python 3.X does it a bit even more elegantly.


that, and also if he (s)he wants to write those characters directly into the source file, you have to include # -*- coding: utf-8 -*- as initially (or second) line, the simply kind u'mêlée'.

in python 3, utf-8 is the typical encoding, and strings without prefix are unicode strings, so just compose 'naïve' tright here.

in both instances, as soon as using windows, you have to encertain that your editor saves files as utf-8. windows notepad won’t perform.


You have the right to fake Py3 string behavior in Py2 by doing the adhering to.

See more: Millers Valley Book Review S Of Miller'S Valley By Anna Quindlen

Placed # -*- coding: utf-8 -*- on the initially or second line of every .py file and also make sure your editor is collection to save your .py documents as UTF-8 encoded.

Put from __future__ import unicode_literals prior to any kind of other imports. Now every hard-coded string in your .py file is a unicode object rather of a str object.

Do from io import open up to rearea PY2's open up integrated function via Py3's open built-in function. This brand-new open up attribute lets you conveniently specify an encoding. Reading from a record opened with the new open up produces a unicode string, and writing instantly encodes your unicode string. I discover it a tiny easier to use than codecs.open up.

Set your terminal window to UTF-8, if feasible.

If you are stuck to Windows CMD.exe, your terminal will not be able to manage eextremely Unicode character. It defaults to CP437 encoding, which can only handle 255 distinctive personalities. Since you are interested in managing both accented Latin and also also the micron character, CP437 might be fine. However, it can't take care of curly quotes, euro symbol, and whatnot. Because of this, you could desire to set it to CP1252. This, too, is restricted to 255 unique personalities, but they're more useful from an office-y standpoint.

Type chcp in your CMD Command also Prompt home window to check out the current "code page." It most likely states 437. Type chcp 1252 to readjust it to CP1252. Now go to the Properties of your CMD Command Prompt window. Make certain the Font is a TrueType font such as Lucida Console rather of "Raster Fonts" and Save this setting. Now you're all collection for CP1252 terminal IO. Python will immediately decode/encode the terminal IO to/from unicode string objects. If you uncover that some terminal output bombs, change your print foo Python statements to print repr(foo).

Placed the complying with code in a paper referred to as unitrial.py for experiments:

#!/usr/bin/env python# -*- coding: UTF-8 -*-from __future__ import unicode_literalsfrom io import openimport sysprint sys.stdin.encoding, sys.stdout.encodingmelee = 'mêlée'print type(melee), melee, repr(melee)micron = '47µ'print type(micron), micron, repr(micron)doc = '“What’s up doc?”' # curly punctuationprint type(doc), doc, repr(doc)through open('unitrial.py', 'rt', encoding='utf8') as f: print print f.encoding print f.read()