문자열에서 여러 문자를 바꾸는 가장 좋은 방법은 무엇입니까?
나는 다음과 같은 몇 가지 문자를 대체해야합니다 &
➔ \&
, #
➔ \#
, ...
나는 다음과 같이 코딩했지만 더 좋은 방법이 있어야한다고 생각합니다. 힌트가 있습니까?
strs = strs.replace('&', '\&')
strs = strs.replace('#', '\#')
...
두 문자 교체
나는 현재 답변의 모든 방법을 하나 더 추가했습니다.
abc&def#ghi
&-> \ & 및 #-> # 의 입력 문자열을 사용하여 다음 과 같이 대체를 연결하는 것이 가장 빠릅니다 text.replace('&', '\&').replace('#', '\#')
.
각 기능의 타이밍 :
- a) 1000000 루프, 루프 당 3 : 1.47 μs 최고
- b) 1000000 루프, 루프 당 3 : 1 최고 1.51 μs
- c) 100000 루프, 루프 당 3 : 12.3 μs
- d) 100000 루프, 루프 당 3:12 μs 중 최고
- e) 100000 루프, 루프 당 3 : 3.27 μs 중 최고
- f) 1000000 루프, 루프 당 3 : 0.817 μs
- g) 100000 루프, 루프 당 3 : 3.64 μs 중 최고
- h) 1000000 루프, 루프 당 3 : 0.927 μs
- i) 1000000 루프, 루프 당 3 : 최고 0.814 μs
기능은 다음과 같습니다.
def a(text):
chars = "&#"
for c in chars:
text = text.replace(c, "\\" + c)
def b(text):
for ch in ['&','#']:
if ch in text:
text = text.replace(ch,"\\"+ch)
import re
def c(text):
rx = re.compile('([&#])')
text = rx.sub(r'\\\1', text)
RX = re.compile('([&#])')
def d(text):
text = RX.sub(r'\\\1', text)
def mk_esc(esc_chars):
return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('&#')
def e(text):
esc(text)
def f(text):
text = text.replace('&', '\&').replace('#', '\#')
def g(text):
replacements = {"&": "\&", "#": "\#"}
text = "".join([replacements.get(c, c) for c in text])
def h(text):
text = text.replace('&', r'\&')
text = text.replace('#', r'\#')
def i(text):
text = text.replace('&', r'\&').replace('#', r'\#')
이 같은 시간 :
python -mtimeit -s"import time_functions" "time_functions.a('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.b('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.c('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.d('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.e('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.f('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.g('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.h('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.i('abc&def#ghi')"
17 자 교체
다음과 동일하지만 이스케이프 할 문자가 더 많은 유사한 코드가 있습니다 (\`* _ {}> # +-.! $) :
def a(text):
chars = "\\`*_{}[]()>#+-.!$"
for c in chars:
text = text.replace(c, "\\" + c)
def b(text):
for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
if ch in text:
text = text.replace(ch,"\\"+ch)
import re
def c(text):
rx = re.compile('([&#])')
text = rx.sub(r'\\\1', text)
RX = re.compile('([\\`*_{}[]()>#+-.!$])')
def d(text):
text = RX.sub(r'\\\1', text)
def mk_esc(esc_chars):
return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('\\`*_{}[]()>#+-.!$')
def e(text):
esc(text)
def f(text):
text = text.replace('\\', '\\\\').replace('`', '\`').replace('*', '\*').replace('_', '\_').replace('{', '\{').replace('}', '\}').replace('[', '\[').replace(']', '\]').replace('(', '\(').replace(')', '\)').replace('>', '\>').replace('#', '\#').replace('+', '\+').replace('-', '\-').replace('.', '\.').replace('!', '\!').replace('$', '\$')
def g(text):
replacements = {
"\\": "\\\\",
"`": "\`",
"*": "\*",
"_": "\_",
"{": "\{",
"}": "\}",
"[": "\[",
"]": "\]",
"(": "\(",
")": "\)",
">": "\>",
"#": "\#",
"+": "\+",
"-": "\-",
".": "\.",
"!": "\!",
"$": "\$",
}
text = "".join([replacements.get(c, c) for c in text])
def h(text):
text = text.replace('\\', r'\\')
text = text.replace('`', r'\`')
text = text.replace('*', r'\*')
text = text.replace('_', r'\_')
text = text.replace('{', r'\{')
text = text.replace('}', r'\}')
text = text.replace('[', r'\[')
text = text.replace(']', r'\]')
text = text.replace('(', r'\(')
text = text.replace(')', r'\)')
text = text.replace('>', r'\>')
text = text.replace('#', r'\#')
text = text.replace('+', r'\+')
text = text.replace('-', r'\-')
text = text.replace('.', r'\.')
text = text.replace('!', r'\!')
text = text.replace('$', r'\$')
def i(text):
text = text.replace('\\', r'\\').replace('`', r'\`').replace('*', r'\*').replace('_', r'\_').replace('{', r'\{').replace('}', r'\}').replace('[', r'\[').replace(']', r'\]').replace('(', r'\(').replace(')', r'\)').replace('>', r'\>').replace('#', r'\#').replace('+', r'\+').replace('-', r'\-').replace('.', r'\.').replace('!', r'\!').replace('$', r'\$')
동일한 입력 문자열에 대한 결과는 다음과 같습니다 abc&def#ghi
.
- a) 100000 루프, 루프 당 3 : 6.72 μs 최고
- b) 100000 루프, 루프 당 3 : 3.64 μs 중 최고
- c) 100000 루프, 루프 당 3 : 9.9 μs 중 최고
- d) 100000 루프, 루프 당 3 : 3.92 μs 최고
- e) 100000 루프, 루프 당 3 : 3.96 μs 최고
- f) 100000 루프, 루프 당 3 : 3.29 μs 최고
- g) 100000 루프, 루프 당 3 : 3.68 μs 중 최고
- h) 100000 루프, 루프 당 3 : 3.73 μs 중 최고
- i) 100000 루프, 루프 당 3 : 3.24 μs 최고
그리고 더 긴 입력 문자열 ( ## *Something* and [another] thing in a longer sentence with {more} things to replace$
)
- a) 100000 루프, 루프 당 3 : 59.59 μs
- b) 100000 loops, best of 3: 6.54 μs per loop
- c) 100000 loops, best of 3: 16.9 μs per loop
- d) 100000 loops, best of 3: 7.29 μs per loop
- e) 100000 loops, best of 3: 12.2 μs per loop
- f) 100000 loops, best of 3: 5.38 μs per loop
- g) 10000 loops, best of 3: 21.7 μs per loop
- h) 100000 loops, best of 3: 5.7 μs per loop
- i) 100000 loops, best of 3: 5.13 μs per loop
Adding a couple of variants:
def ab(text):
for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
text = text.replace(ch,"\\"+ch)
def ba(text):
chars = "\\`*_{}[]()>#+-.!$"
for c in chars:
if c in text:
text = text.replace(c, "\\" + c)
With the shorter input:
- ab) 100000 loops, best of 3: 7.05 μs per loop
- ba) 100000 loops, best of 3: 2.4 μs per loop
With the longer input:
- ab) 100000 loops, best of 3: 7.71 μs per loop
- ba) 100000 loops, best of 3: 6.08 μs per loop
So I'm going to use ba
for readability and speed.
Addendum
Prompted by haccks in the comments, one difference between ab
and ba
is the if c in text:
check. Let's test them against two more variants:
def ab_with_check(text):
for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
if ch in text:
text = text.replace(ch,"\\"+ch)
def ba_without_check(text):
chars = "\\`*_{}[]()>#+-.!$"
for c in chars:
text = text.replace(c, "\\" + c)
Times in μs per loop on Python 2.7.14 and 3.6.3, and on a different machine from the earlier set, so cannot be compared directly.
╭────────────╥──────┬───────────────┬──────┬──────────────────╮
│ Py, input ║ ab │ ab_with_check │ ba │ ba_without_check │
╞════════════╬══════╪═══════════════╪══════╪══════════════════╡
│ Py2, short ║ 8.81 │ 4.22 │ 3.45 │ 8.01 │
│ Py3, short ║ 5.54 │ 1.34 │ 1.46 │ 5.34 │
├────────────╫──────┼───────────────┼──────┼──────────────────┤
│ Py2, long ║ 9.3 │ 7.15 │ 6.85 │ 8.55 │
│ Py3, long ║ 7.43 │ 4.38 │ 4.41 │ 7.02 │
└────────────╨──────┴───────────────┴──────┴──────────────────┘
We can conclude that:
Those with the check are up to 4x faster than those without the check
ab_with_check
is slightly in the lead on Python 3, butba
(with check) has a greater lead on Python 2However, the biggest lesson here is Python 3 is up to 3x faster than Python 2! There's not a huge difference between the slowest on Python 3 and fastest on Python 2!
>>> string="abc&def#ghi"
>>> for ch in ['&','#']:
... if ch in string:
... string=string.replace(ch,"\\"+ch)
...
>>> print string
abc\&def\#ghi
Simply chain the replace
functions like this
strs = "abc&def#ghi"
print strs.replace('&', '\&').replace('#', '\#')
# abc\&def\#ghi
If the replacements are going to be more in number, you can do this in this generic way
strs, replacements = "abc&def#ghi", {"&": "\&", "#": "\#"}
print "".join([replacements.get(c, c) for c in strs])
# abc\&def\#ghi
Here is a python3 method using str.translate
and str.maketrans
:
s = "abc&def#ghi"
print(s.translate(str.maketrans({'&': '\&', '#': '\#'})))
The printed string is abc\&def\#ghi
.
Are you always going to prepend a backslash? If so, try
import re
rx = re.compile('([&#])')
# ^^ fill in the characters here.
strs = rx.sub('\\\\\\1', strs)
It may not be the most efficient method but I think it is the easiest.
You may consider writing a generic escape function:
def mk_esc(esc_chars):
return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
>>> esc = mk_esc('&#')
>>> print esc('Learn & be #1')
Learn \& be \#1
This way you can make your function configurable with a list of character that should be escaped.
Late to the party, but I lost a lot of time with this issue until I found my answer.
Short and sweet, translate
is superior to replace
. If you're more interested in funcionality over time optimization, do not use replace
.
Also use translate
if you don't know if the set of characters to be replaced overlaps the set of characters used to replace.
Case in point:
Using replace
you would naively expect the snippet "1234".replace("1", "2").replace("2", "3").replace("3", "4")
to return "2344"
, but it will return in fact "4444"
.
Translation seems to perform what OP originally desired.
FYI, this is of little or no use to the OP but it may be of use to other readers (please do not downvote, I'm aware of this).
As a somewhat ridiculous but interesting exercise, wanted to see if I could use python functional programming to replace multiple chars. I'm pretty sure this does NOT beat just calling replace() twice. And if performance was an issue, you could easily beat this in rust, C, julia, perl, java, javascript and maybe even awk. It uses an external 'helpers' package called pytoolz, accelerated via cython (cytoolz, it's a pypi package).
from cytoolz.functoolz import compose
from cytoolz.itertoolz import chain,sliding_window
from itertools import starmap,imap,ifilter
from operator import itemgetter,contains
text='&hello#hi&yo&'
char_index_iter=compose(partial(imap, itemgetter(0)), partial(ifilter, compose(partial(contains, '#&'), itemgetter(1))), enumerate)
print '\\'.join(imap(text.__getitem__, starmap(slice, sliding_window(2, chain((0,), char_index_iter(text), (len(text),))))))
I'm not even going to explain this because no one would bother using this to accomplish multiple replace. Nevertheless, I felt somewhat accomplished in doing this and thought it might inspire other readers or win a code obfuscation contest.
Using reduce which is available in python2.7 and python3.* you can easily replace mutiple substrings in a clean and pythonic way.
# Lets define a helper method to make it easy to use
def replacer(text, replacements):
return reduce(
lambda text, ptuple: text.replace(ptuple[0], ptuple[1]),
replacements, text
)
if __name__ == '__main__':
uncleaned_str = "abc&def#ghi"
cleaned_str = replacer(uncleaned_str, [("&","\&"),("#","\#")])
print(cleaned_str) # "abc\&def\#ghi"
In python2.7 you don't have to import reduce but in python3.* you have to import it from the functools module.
>>> a = '&#'
>>> print a.replace('&', r'\&')
\&#
>>> print a.replace('#', r'\#')
&\#
>>>
You want to use a 'raw' string (denoted by the 'r' prefixing the replacement string), since raw strings to not treat the backslash specially.
참고URL : https://stackoverflow.com/questions/3411771/best-way-to-replace-multiple-characters-in-a-string
'Programing' 카테고리의 다른 글
meld와 분기의 차이점을 보시겠습니까? (0) | 2020.06.13 |
---|---|
jquery를 사용하여 여러 선택 상자 값을 얻는 방법은 무엇입니까? (0) | 2020.06.13 |
Emacs에서 전체 라인을 어떻게 복제합니까? (0) | 2020.06.13 |
ActiveRecord의 무작위 레코드 (0) | 2020.06.13 |
반복되는 문자로 채워진 가변 길이의 문자열을 만듭니다. (0) | 2020.06.13 |