파이썬의 기본 인코딩을 변경 하시겠습니까?

Programing

파이썬의 기본 인코딩을 변경 하시겠습니까?

crosscheck 2020. 7. 12. 09:52

파이썬의 기본 인코딩을 변경 하시겠습니까?

콘솔에서 응용 프로그램을 실행할 때 Python에서 많은 "인코딩 할 수 없음"및 "디코딩 할 수 없음"문제가 있습니다. 그러나 Eclipse PyDev IDE에서 기본 문자 인코딩은 UTF-8로 설정되어 있습니다.

기본 인코딩 설정을 검색했으며 사람들은 Python이 sys.setdefaultencoding시작시 함수를 삭제한다고 말하면서 사용할 수 없습니다.

그렇다면 가장 좋은 해결책은 무엇입니까?

다음은 setdefaultencoding()삭제 된 함수를 제공하는 간단한 방법 (해킹)입니다 sys.

import sys
# sys.setdefaultencoding() does not exist, here!
reload(sys)  # Reload does the trick!
sys.setdefaultencoding('UTF8')

그러나 이것은 안전한 일이 아닙니다 . 파이썬이 시작될 때 sys.setdefaultencoding()의도적으로 제거되었으므로 이것은 분명히 해킹 sys입니다. 이를 활성화 하고 기본 인코딩을 변경하면 ASCII를 기본값으로하는 코드가 중단 될 수 있습니다 (이 코드는 타사 코드 일 수 있으므로 일반적으로 수정이 불가능하거나 위험합니다).

스크립트 출력을 파이프 / 리디렉션하려고 할 때이 오류가 발생하는 경우

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

콘솔에서 PYTHONIOENCODING을 내 보낸 다음 코드를 실행하십시오.

export PYTHONIOENCODING=utf8

A) sys.getdefaultencoding()출력 을 제어하려면 :

python -c 'import sys; print(sys.getdefaultencoding())'

ascii

그때

echo "import sys; sys.setdefaultencoding('utf-16-be')" > sitecustomize.py

과

PYTHONPATH=".:$PYTHONPATH" python -c 'import sys; print(sys.getdefaultencoding())'

utf-16-be

sitecustomize.py를에 더 높게 넣을 수 있습니다 PYTHONPATH.

또한 당신은 reload(sys).setdefaultencoding@EOL에 의해 시도 하고 싶을 수도 있습니다

B) 제어 stdin.encoding하고 stdout.encoding설정 하려면 PYTHONIOENCODING:

python -c 'import sys; print(sys.stdin.encoding, sys.stdout.encoding)'

ascii ascii

그때

PYTHONIOENCODING="utf-16-be" python -c 'import sys; 
print(sys.stdin.encoding, sys.stdout.encoding)'

utf-16-be utf-16-be

마지막으로 A) 또는 B) 또는 둘 다 사용할 수 있습니다 !

PyDev 3.4.1 부터는 기본 인코딩이 더 이상 변경되지 않습니다. 자세한 내용은 이 티켓 을 참조하십시오.

이전 버전의 솔루션은 PyDev가 기본 인코딩으로 UTF-8과 함께 실행되지 않도록하는 것입니다. Eclipse에서 대화 상자 설정을 실행하십시오 (정확하게 기억하는 경우 "구성 실행"). 공통 탭에서 기본 인코딩을 선택할 수 있습니다. 이러한 오류를 '초기'(즉, PyDev 환경에서)하려는 경우 US-ASCII로 변경하십시오. 이 대안에 대한 원본 블로그 게시물 도 참조하십시오 .

python2 (및 python2 전용)와 관련하여 이전 답변 중 일부는 다음 해킹을 사용합니다.

import sys
reload(sys)  # Reload is a hack
sys.setdefaultencoding('UTF8')

사용하지 않는 것이 좋습니다 ( this 또는 this 확인 )

제 경우에는 부작용이 있습니다. ipython 노트북을 사용하고 있으며 코드를 실행하면 '인쇄'기능이 더 이상 작동하지 않습니다. 해결책이있을 것 같지만 여전히 해킹을 사용하는 것이 올바른 옵션이 아니라고 생각합니다.

많은 옵션을 시도한 후에 저에게 도움이 된 방법은에서 동일한 코드를 사용하는 sitecustomize.py것 입니다. 해당 모듈을 평가 한 후 setdefaultencoding 함수가 sys에서 제거됩니다.

따라서 해결책은 /usr/lib/python2.7/sitecustomize.py코드 를 파일에 추가하는 것입니다 .

import sys
sys.setdefaultencoding('UTF8')

virtualenvwrapper를 사용할 때 편집하는 파일은 ~/.virtualenvs/venv-name/lib/python2.7/sitecustomize.py입니다.

파이썬 노트북 및 콘다와 함께 사용하면 ~/anaconda2/lib/python2.7/sitecustomize.py

그것에 대한 통찰력있는 블로그 게시물이 있습니다.

See https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/.

I paraphrase its content below.

In python 2 which was not as strongly typed regarding the encoding of strings you could perform operations on differently encoded strings, and succeed. E.g. the following would return True.

u'Toshio' == 'Toshio'

That would hold for every (normal, unprefixed) string that was encoded in sys.getdefaultencoding(), which defaulted to ascii, but not others.

The default encoding was meant to be changed system-wide in site.py, but not somewhere else. The hacks (also presented here) to set it in user modules were just that: hacks, not the solution.

Python 3 did changed the system encoding to default to utf-8 (when LC_CTYPE is unicode-aware), but the fundamental problem was solved with the requirement to explicitly encode "byte"strings whenever they are used with unicode strings.

First: reload(sys) and setting some random default encoding just regarding the need of an output terminal stream is bad practice. reload often changes things in sys which have been put in place depending on the environment - e.g. sys.stdin/stdout streams, sys.excepthook, etc.

Solving the encode problem on stdout

The best solution I know for solving the encode problem of print'ing unicode strings and beyond-ascii str's (e.g. from literals) on sys.stdout is: to take care of a sys.stdout (file-like object) which is capable and optionally tolerant regarding the needs:

When sys.stdout.encoding is None for some reason, or non-existing, or erroneously false or "less" than what the stdout terminal or stream really is capable of, then try to provide a correct .encoding attribute. At last by replacing sys.stdout & sys.stderr by a translating file-like object.
When the terminal / stream still cannot encode all occurring unicode chars, and when you don't want to break print's just because of that, you can introduce an encode-with-replace behavior in the translating file-like object.

Here an example:

#!/usr/bin/env python
# encoding: utf-8
import sys

class SmartStdout:
    def __init__(self, encoding=None, org_stdout=None):
        if org_stdout is None:
            org_stdout = getattr(sys.stdout, 'org_stdout', sys.stdout)
        self.org_stdout = org_stdout
        self.encoding = encoding or \
                        getattr(org_stdout, 'encoding', None) or 'utf-8'
    def write(self, s):
        self.org_stdout.write(s.encode(self.encoding, 'backslashreplace'))
    def __getattr__(self, name):
        return getattr(self.org_stdout, name)

if __name__ == '__main__':
    if sys.stdout.isatty():
        sys.stdout = sys.stderr = SmartStdout()

    us = u'aouäöüфżß²'
    print us
    sys.stdout.flush()

Using beyond-ascii plain string literals in Python 2 / 2 + 3 code

The only good reason to change the global default encoding (to UTF-8 only) I think is regarding an application source code decision - and not because of I/O stream encodings issues: For writing beyond-ascii string literals into code without being forced to always use u'string' style unicode escaping. This can be done rather consistently (despite what anonbadger's article says) by taking care of a Python 2 or Python 2 + 3 source code basis which uses ascii or UTF-8 plain string literals consistently - as far as those strings potentially undergo silent unicode conversion and move between modules or potentially go to stdout. For that, prefer "# encoding: utf-8" or ascii (no declaration). Change or drop libraries which still rely in a very dumb way fatally on ascii default encoding errors beyond chr #127 (which is rare today).

And do like this at application start (and/or via sitecustomize.py) in addition to the SmartStdout scheme above - without using reload(sys):

...
def set_defaultencoding_globally(encoding='utf-8'):
    assert sys.getdefaultencoding() in ('ascii', 'mbcs', encoding)
    import imp
    _sys_org = imp.load_dynamic('_sys_org', 'sys')
    _sys_org.setdefaultencoding(encoding)

if __name__ == '__main__':
    sys.stdout = sys.stderr = SmartStdout()
    set_defaultencoding_globally('utf-8') 
    s = 'aouäöüфżß²'
    print s

This way string literals and most operations (except character iteration) work comfortable without thinking about unicode conversion as if there would be Python3 only. File I/O of course always need special care regarding encodings - as it is in Python3.

Note: plains strings then are implicitely converted from utf-8 to unicode in SmartStdout before being converted to the output stream enconding.

Here is the approach I used to produce code that was compatible with both python2 and python3 and always produced utf8 output. I found this answer elsewhere, but I can't remember the source.

This approach works by replacing sys.stdout with something that isn't quite file-like (but still only using things in the standard library). This may well cause problems for your underlying libraries, but in the simple case where you have good control over how sys.stdout out is used through your framework this can be a reasonable approach.

sys.stdout = io.open(sys.stdout.fileno(), 'w', encoding='utf8')

This fixed the issue for me.

import os
os.environ["PYTHONIOENCODING"] = "utf-8"

This is a quick hack for anyone who is (1) On a Windows platform (2) running Python 2.7 and (3) annoyed because a nice piece of software (i.e., not written by you so not immediately a candidate for encode/decode printing maneuvers) won't display the "pretty unicode characters" in the IDLE environment (Pythonwin prints unicode fine), For example, the neat First Order Logic symbols that Stephan Boyer uses in the output from his pedagogic prover at First Order Logic Prover.

I didn't like the idea of forcing a sys reload and I couldn't get the system to cooperate with setting environment variables like PYTHONIOENCODING (tried direct Windows environment variable and also dropping that in a sitecustomize.py in site-packages as a one liner ='utf-8').

So, if you are willing to hack your way to success, go to your IDLE directory, typically: "C:\Python27\Lib\idlelib" Locate the file IOBinding.py. Make a copy of that file and store it somewhere else so you can revert to original behavior when you choose. Open the file in the idlelib with an editor (e.g., IDLE). Go to this code area:

# Encoding for file names
filesystemencoding = sys.getfilesystemencoding()

encoding = "ascii"
if sys.platform == 'win32':
    # On Windows, we could use "mbcs". However, to give the user
    # a portable encoding name, we need to find the code page 
    try:
        # --> 6/5/17 hack to force IDLE to display utf-8 rather than cp1252
        # --> encoding = locale.getdefaultlocale()[1]
        encoding = 'utf-8'
        codecs.lookup(encoding)
    except LookupError:
        pass

In other words, comment out the original code line following the 'try' that was making the encoding variable equal to locale.getdefaultlocale (because that will give you cp1252 which you don't want) and instead brute force it to 'utf-8' (by adding the line 'encoding = 'utf-8' as shown).

I believe this only affects IDLE display to stdout and not the encoding used for file names etc. (that is obtained in the filesystemencoding prior). If you have a problem with any other code you run in IDLE later, just replace the IOBinding.py file with the original unmodified file.

Is you want to write spanish words (para escribir la ñ en python)

#!/usr/bin/env python
# -*- coding: iso-8859-15 -*-

print "Piña"

참고URL : https://stackoverflow.com/questions/2276200/changing-default-encoding-of-python

'Programing' 카테고리의 다른 글

Android : TextView가 문자열의 마지막 3자를 자동으로 자르고 바꿉니다. (0)	2020.07.12
jQuery는 체크 박스의 값을 배열로 가져옵니다. (0)	2020.07.12
OS X에 MacVim을 어떻게 설치합니까? (0)	2020.07.12
'after_create'와 'after_save'의 차이점은 무엇이며 언제 사용해야합니까? (0)	2020.07.12
NGINX를 사용하여 다른 도메인으로 리디렉션하는 방법은 무엇입니까? (0)	2020.07.12

현재글파이썬의 기본 인코딩을 변경 하시겠습니까?

crosscheck

파이썬의 기본 인코딩을 변경 하시겠습니까?

파이썬의 기본 인코딩을 변경 하시겠습니까?

Solving the encode problem on stdout

Using beyond-ascii plain string literals in Python 2 / 2 + 3 code

'Programing' 카테고리의 다른 글

'Programing'의 다른글

티스토리툴바

파이썬의 기본 인코딩을 변경 하시겠습니까?

파이썬의 기본 인코딩을 변경 하시겠습니까?

Solving the encode problem on stdout

Using beyond-ascii plain string literals in Python 2 / 2 + 3 code

'Programing' 카테고리의 다른 글

'Programing'의 다른글

관련글

티스토리툴바