Programing

주어진 텍스트에서 가장 일반적으로 사용되는 단어의 ASCII 차트 작성

crosscheck 2020. 6. 5. 18:55

주어진 텍스트에서 가장 일반적으로 사용되는 단어의 ASCII 차트 작성

도전 과제 :

주어진 텍스트에서 가장 일반적으로 사용되는 단어의 ASCII 차트를 작성하십시오.

규칙:

단어의 일부로 a-z및 A-Z(알파벳 문자) 만 허용하십시오 .
케이싱을 무시하십시오 ( She== she목적으로).
다음과 같은 단어는 무시하십시오. the, and, of, to, a, i, it, in, or, is
명확한 설명은 : 고려 don't:이 범위에있는 2 개의 다른 '단어'로 간주 될 수 a-z와 A-Z( don과 t).
선택적으로 (지금 공식적으로 사양을 변경하기에는 너무 늦었습니다) 모든 단일 문자 '단어'를 삭제하도록 선택할 수 있습니다 (이는 잠재적으로 무시 목록을 단축시킬 수 있습니다).

주어진 구문을 분석하고 text(명령 행 인수를 통해 지정된 파일을 읽거나 파이프 된; presume us-ascii) word frequency chart다음과 같은 특성으로 우리를 빌드하십시오 .

가장 일반적인 22 개의 단어 (내림차순으로 정렬)에 대한 차트를 표시합니다 (아래 예 참조).
막대 width는 단어의 비례 수를 나타냅니다. 한 칸을 추가하고 단어를 인쇄하십시오.
이 막대 (공백 단어 공백)가 항상 맞는지 확인하십시오 . bar+ [space]+ word+ [space]는 항상 <= 80문자 여야합니다 (가능한 막대와 단어 길이를 고려해야합니다. 예 : 두 번째로 가장 많이 사용되는 단어는 그보다 더 길 수 있습니다) 주파수에서 크게 다르지 않지만 첫 번째). 이러한 구속 조건 내에서 막대 너비를 최대화 하고 막대를 나타내는 주파수에 따라 적절하게 크기를 조정하십시오.

예를 들면 :

예제의 텍스트는 여기 ( Lewis Carroll의 이상한 나라의 앨리스의 모험)에서 찾을 수 있습니다 .

이 특정 텍스트는 다음 차트를 생성합니다.

 _________________________________________________________________________
| _________________________________________________________________________ | 여자
| _______________________________________________________________ | 당신
| ____________________________________________________________ | 말했다
| ____________________________________________________ | 앨리스
| ______________________________________________ | 였다
| __________________________________________ | 그
| ___________________________________ | 같이
| _______________________________ | 그녀
| ____________________________ | 와
| ____________________________ | ...에서
| ___________________________ | 에스
| ___________________________ | 티
| _________________________ | 의 위에
| _________________________ | 모두
| ______________________ | 이
| ______________________ | ...에 대한
| ______________________ | 했다
| _____________________ | 그러나
| ____________________ | 있다
| ____________________ | 아니
| ___________________ | 그들
| __________________ | 그래서

참고로, 위의 차트가 작성되는 빈도는 다음과 같습니다.

[( 'she', 553), ( 'you', 481), ( 'said', 462), ( 'alice', 403), ( 'was', 358), ( 'that
', 330), ('as ', 274), ('her ', 248), ('with ', 227), ('at ', 227), ('s ', 219), ('t '
, 218), ( 'on', 204), ( 'all', 200), ( 'this', 181), ( 'for', 179), ( 'had', 178), ( '
그러나 ', 175), ('be ', 167), ('not ', 166), ('they ', 155), ('so ', 152)]

두 번째 예 (완전한 사양을 구현했는지 확인) :you 링크 된 Alice in Wonderland 파일 에서 발생하는 모든 항목을 다음 으로 바꾸십시오 superlongstringstring.

 ________________________________________________________________
| ________________________________________________________________ | 여자
| _______________________________________________________ | 슈퍼 롱 스트링
| _____________________________________________________ | 말했다
| ______________________________________________ | 앨리스
| ________________________________________ | 였다
| _____________________________________ | 그
| ______________________________ | 같이
| ___________________________ | 그녀
| _________________________ | 와
| _________________________ | ...에서
| ________________________ | 에스
| ________________________ | 티
| ______________________ | 의 위에
| _____________________ | 모두
| ___________________ | 이
| ___________________ | ...에 대한
| ___________________ | 했다
| __________________ | 그러나
| _________________ | 있다
| _________________ | 아니
| ________________ | 그들
| ________________ | 그래서

승자:

가장 짧은 솔루션 (언어 당 문자 수 기준). 즐기세요!

편집 : 지금까지 결과를 요약 한 표 (2012-02-15) (원래 사용자 Nas Banov가 추가) :

언어 편안한 엄격한
========= ======= ======
GolfScript 130143
펄 185
Windows PowerShell 148 199
매스 매 티카 199
루비 185205
유닉스 툴체인 194228
파이썬 183 243
클로저 282
스칼라 311
하스켈 333
어 336
R 298
자바 스크립트 304 354
그루비 321
MATLAB 404
C # 422
스몰 토크 386
PHP 450
F # 452
TSQL 483 507

숫자는 특정 언어로 가장 짧은 솔루션의 길이를 나타냅니다. "엄격한"은 스펙을 완전히 구현하는 솔루션을 의미합니다 ( |____|막대를 그립니다 . 첫 번째 막대를 한 ____줄로 닫고 빈도가 높은 긴 단어의 가능성을 설명합니다). "휴식"은 일부 자유가 해결로 단축되었음을 의미합니다.

500 자보다 짧은 솔루션 만 포함됩니다. 언어 목록은 '엄격한'솔루션의 길이에 따라 정렬됩니다. '유닉스 툴체인 (Unix Toolchain)'은 전통적인 * nix 쉘 과 다양한 툴 (grep, tr, sort, uniq, head, perl, awk) 을 사용하는 다양한 솔루션을 나타내는 데 사용됩니다 .

LabVIEW 51 노드, 5 구조, 10 다이어그램

코끼리에게 탭댄스를 가르치는 것은 결코 예쁘지 않습니다. 아, 문자 수를 건너 뛰겠습니다.

프로그램은 왼쪽에서 오른쪽으로 진행됩니다.

루비 1.9, 185 자

(다른 Ruby 솔루션에 크게 기반)

w=($<.read.downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).group_by{|x|x}.map{|x,y|[-y.size,x]}.sort[0,22]
k,l=w[0]
puts [?\s+?_*m=76-l.size,w.map{|f,x|?|+?_*(f*m/k)+"| "+x}]

다른 솔루션과 같은 명령 줄 스위치를 사용하는 대신 파일 이름을 인수로 전달하면됩니다. (즉 ruby1.9 wordfrequency.rb Alice.txt)

여기서 문자 리터럴을 사용하고 있기 때문에이 솔루션은 Ruby 1.9에서만 작동합니다.

편집 : "가독성"을 위해 세미콜론을 줄 바꿈으로 바꿨습니다. :피

편집 2 : Shtééf는 후행 공간을 잊어 버렸다고 지적했습니다.

편집 3 : 후행 공간을 다시 제거했습니다.;)

GolfScript, 177 175 173 167 164 163 144 131 130 문자

느리게-샘플 텍스트의 경우 3 분 (130)

{32|.123%97<n@if}%]''*n%"oftoitinorisa"2/-"theandi"3/-$(1@{.3$>1{;)}if}/]2/{~~\;}$22<.0=~:2;,76\-:1'_':0*' '\@{"
|"\~1*2/0*'| '@}/

설명:

{           #loop through all characters
 32|.       #convert to uppercase and duplicate
 123%97<    #determine if is a letter
 n@if       #return either the letter or a newline
}%          #return an array (of ints)
]''*        #convert array to a string with magic
n%          #split on newline, removing blanks (stack is an array of words now)
"oftoitinorisa"   #push this string
2/          #split into groups of two, i.e. ["of" "to" "it" "in" "or" "is" "a"]
-           #remove any occurrences from the text
"theandi"3/-#remove "the", "and", and "i"
$           #sort the array of words
(1@         #takes the first word in the array, pushes a 1, reorders stack
            #the 1 is the current number of occurrences of the first word
{           #loop through the array
 .3$>1{;)}if#increment the count or push the next word and a 1
}/
]2/         #gather stack into an array and split into groups of 2
{~~\;}$     #sort by the latter element - the count of occurrences of each word
22<         #take the first 22 elements
.0=~:2;     #store the highest count
,76\-:1     #store the length of the first line
'_':0*' '\@ #make the first line
{           #loop through each word
"
|"\~        #start drawing the bar
1*2/0       #divide by zero
*'| '@      #finish drawing the bar
}/

"올바른"(희망). (143)

{32|.123%97<n@if}%]''*n%"oftoitinorisa"2/-"theandi"3/-$(1@{.3$>1{;)}if}/]2/{~~\;}$22<..0=1=:^;{~76@,-^*\/}%$0=:1'_':0*' '\@{"
|"\~1*^/0*'| '@}/

느리게-30 분. (162)

'"'/' ':S*n/S*'"#{%q
'\+"
.downcase.tr('^a-z','
')}\""+~n%"oftoitinorisa"2/-"theandi"3/-$(1@{.3$>1{;)}if}/]2/{~~\;}$22<.0=~:2;,76\-:1'_':0*S\@{"
|"\~1*2/0*'| '@}/

개정판 로그에 출력이 표시됩니다.

206

쉘, grep, tr, grep, sort, uniq, sort, head, perl

~ % wc -c wfg
209 wfg
~ % cat wfg
egrep -oi \\b[a-z]+|tr A-Z a-z|egrep -wv 'the|and|of|to|a|i|it|in|or|is'|sort|uniq -c|sort -nr|head -22|perl -lape'($f,$w)=@F;$.>1or($q,$x)=($f,76-length$w);$b="_"x($f/$q*$x);$_="|$b| $w ";$.>1or$_=" $b\n$_"'
~ % # usage:
~ % sh wfg < 11.txt

~~hm, 위에서 본 것 : sort -nr-> sort -n그리고 head-> tail=> 208 :)~~
update2 : 음, 물론 위의 내용은 반대로되므로 어리 석습니다. 따라서 209.
update3 : 제외 정규 표현식 최적화-> 206

egrep -oi \\b[a-z]+|tr A-Z a-z|egrep -wv 'the|and|o[fr]|to|a|i[tns]?'|sort|uniq -c|sort -nr|head -22|perl -lape'($f,$w)=@F;$.>1or($q,$x)=($f,76-length$w);$b="_"x($f/$q*$x);$_="|$b| $w ";$.>1or$_=" $b\n$_"'

재미를 위해 여기에 perl 전용 버전이 있습니다 (훨씬 빠름).

~ % wc -c pgolf
204 pgolf
~ % cat pgolf
perl -lne'$1=~/^(the|and|o[fr]|to|.|i[tns])$/i||$f{lc$1}++while/\b([a-z]+)/gi}{@w=(sort{$f{$b}<=>$f{$a}}keys%f)[0..21];$Q=$f{$_=$w[0]};$B=76-y///c;print" "."_"x$B;print"|"."_"x($B*$f{$_}/$Q)."| $_"for@w'
~ % # usage:
~ % sh pgolf < 11.txt

거래 SQL은 (2005 SQL 서버) 솔루션을 기반으로 세트 1,063 892 873 853 827 820 783 683 647 644 630 자

캐릭터 수를 줄이는 유용한 제안을 한 Gabe에게 감사드립니다.

주의 : 스크롤바를 피하기 위해 줄 바꿈이 추가되었습니다. 마지막 줄 바꿈 만 필요합니다.

DECLARE @ VARCHAR(MAX),@F REAL SELECT @=BulkColumn FROM OPENROWSET(BULK'A',
SINGLE_BLOB)x;WITH N AS(SELECT 1 i,LEFT(@,1)L UNION ALL SELECT i+1,SUBSTRING
(@,i+1,1)FROM N WHERE i<LEN(@))SELECT i,L,i-RANK()OVER(ORDER BY i)R INTO #D
FROM N WHERE L LIKE'[A-Z]'OPTION(MAXRECURSION 0)SELECT TOP 22 W,-COUNT(*)C
INTO # FROM(SELECT DISTINCT R,(SELECT''+L FROM #D WHERE R=b.R FOR XML PATH
(''))W FROM #D b)t WHERE LEN(W)>1 AND W NOT IN('the','and','of','to','it',
'in','or','is')GROUP BY W ORDER BY C SELECT @F=MIN(($76-LEN(W))/-C),@=' '+
REPLICATE('_',-MIN(C)*@F)+' 'FROM # SELECT @=@+' 
|'+REPLICATE('_',-C*@F)+'| '+W FROM # ORDER BY C PRINT @

읽을 수있는 버전

DECLARE @  VARCHAR(MAX),
        @F REAL
SELECT @=BulkColumn
FROM   OPENROWSET(BULK'A',SINGLE_BLOB)x; /*  Loads text file from path
                                             C:\WINDOWS\system32\A  */

/*Recursive common table expression to
generate a table of numbers from 1 to string length
(and associated characters)*/
WITH N AS
     (SELECT 1 i,
             LEFT(@,1)L

     UNION ALL

     SELECT i+1,
            SUBSTRING(@,i+1,1)
     FROM   N
     WHERE  i<LEN(@)
     )
  SELECT   i,
           L,
           i-RANK()OVER(ORDER BY i)R
           /*Will group characters
           from the same word together*/
  INTO     #D
  FROM     N
  WHERE    L LIKE'[A-Z]'OPTION(MAXRECURSION 0)
             /*Assuming case insensitive accent sensitive collation*/

SELECT   TOP 22 W,
         -COUNT(*)C
INTO     #
FROM     (SELECT DISTINCT R,
                          (SELECT ''+L
                          FROM    #D
                          WHERE   R=b.R FOR XML PATH('')
                          )W
                          /*Reconstitute the word from the characters*/
         FROM             #D b
         )
         T
WHERE    LEN(W)>1
AND      W NOT IN('the',
                  'and',
                  'of' ,
                  'to' ,
                  'it' ,
                  'in' ,
                  'or' ,
                  'is')
GROUP BY W
ORDER BY C

/*Just noticed this looks risky as it relies on the order of evaluation of the 
 variables. I'm not sure that's guaranteed but it works on my machine :-) */
SELECT @F=MIN(($76-LEN(W))/-C),
       @ =' '      +REPLICATE('_',-MIN(C)*@F)+' '
FROM   #

SELECT @=@+' 
|'+REPLICATE('_',-C*@F)+'| '+W
             FROM     #
             ORDER BY C

PRINT @

산출

 _________________________________________________________________________ 
|_________________________________________________________________________| she
|_______________________________________________________________| You
|____________________________________________________________| said
|_____________________________________________________| Alice
|_______________________________________________| was
|___________________________________________| that
|____________________________________| as
|________________________________| her
|_____________________________| at
|_____________________________| with
|__________________________| on
|__________________________| all
|_______________________| This
|_______________________| for
|_______________________| had
|_______________________| but
|______________________| be
|_____________________| not
|____________________| they
|____________________| So
|___________________| very
|__________________| what

긴 줄로

 _______________________________________________________________ 
|_______________________________________________________________| she
|_______________________________________________________| superlongstringstring
|____________________________________________________| said
|______________________________________________| Alice
|________________________________________| was
|_____________________________________| that
|_______________________________| as
|____________________________| her
|_________________________| at
|_________________________| with
|_______________________| on
|______________________| all
|____________________| This
|____________________| for
|____________________| had
|____________________| but
|___________________| be
|__________________| not
|_________________| they
|_________________| So
|________________| very
|________________| what

루비 207 213 211 210 207 개 203 201 200 문자

rfusca의 제안을 통합 한 Anurag의 개선. 또한 정렬과 다른 몇 가지 사소한 골프에 대한 논쟁을 제거합니다.

w=(STDIN.read.downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).group_by{|x|x}.map{|x,y|[-y.size,x]}.sort.take 22;k,l=w[0];m=76.0-l.size;puts' '+'_'*m;w.map{|f,x|puts"|#{'_'*(m*f/k)}| #{x} "}

다음과 같이 실행하십시오.

ruby GolfedWordFrequencies.rb < Alice.txt

편집 : '입력'을 다시 넣고 출력에 따옴표가 생기지 않도록해야합니다.
Edit2 : 변경된 파일-> IO
Edit3 : 제거 / i
Edit4 : (f * 1.0) 주위의 괄호를 제거하고 다시
계산 Edit5 : 첫 번째 줄에 문자열 추가 사용; 그 s자리에서 확장하십시오 .
Edit6 : m 플로트를 만들고 1.0을 제거했습니다. 편집 : 작동하지 않고 길이가 변경됩니다. 편집 :
Edit7 : Use 이전보다 나쁘지 않습니다 STDIN.read.

Mathematica ( 297 284 248 244 242 199) 순수 기능

Zipf의 법 테스트

맘마 봐 ... 아니 vars, 아니 손, .. 아니 머리

편집 1> 일부 속기 정의 (284 자)

f[x_, y_] := Flatten[Take[x, All, y]]; 

BarChart[f[{##}, -1], 
         BarOrigin -> Left, 
         ChartLabels -> Placed[f[{##}, 1], After], 
         Axes -> None
] 
& @@
Take[
  SortBy[
     Tally[
       Select[
        StringSplit[ToLowerCase[Import[i]], RegularExpression["\\W+"]], 
       !MemberQ[{"the", "and", "of", "to", "a", "i", "it", "in", "or","is"}, #]&]
     ], 
  Last], 
-22]

몇 가지 설명

Import[] 
   # Get The File

ToLowerCase []
   # To Lower Case :)

StringSplit[ STRING , RegularExpression["\\W+"]]
   # Split By Words, getting a LIST

Select[ LIST, !MemberQ[{LIST_TO_AVOID}, #]&]
   #  Select from LIST except those words in LIST_TO_AVOID
   #  Note that !MemberQ[{LIST_TO_AVOID}, #]& is a FUNCTION for the test

Tally[LIST]
   # Get the LIST {word,word,..} 
     and produce another  {{word,counter},{word,counter}...}

SortBy[ LIST ,Last]
   # Get the list produced bt tally and sort by counters
     Note that counters are the LAST element of {word,counter}

Take[ LIST ,-22]
   # Once sorted, get the biggest 22 counters

BarChart[f[{##}, -1], ChartLabels -> Placed[f[{##}, 1], After]] &@@ LIST
   # Get the list produced by Take as input and produce a bar chart

f[x_, y_] := Flatten[Take[x, All, y]]
   # Auxiliary to get the list of the first or second element of lists of lists x_
     dependending upon y
   # So f[{##}, -1] is the list of counters
   # and f[{##}, 1] is the list of words (labels for the chart)

산출

대체 텍스트 http://i49.tinypic.com/2n8mrer.jpg

Mathematica는 골프에 적합하지 않으며, 길고 설명적인 기능 이름 때문입니다. "RegularExpression []"또는 "StringSplit []"과 같은 함수를 사용하면 :(.

Zipf의 법 테스트

지프의 법칙은 자연 언어 텍스트의 것으로 예측 로그 (순위) 대 로그 (발생) 줄거리는 다음과 선형 관계를.

이 법은 암호화 및 데이터 압축 알고리즘 개발에 사용됩니다. 그러나 LZW 알고리즘의 "Z"가 아닙니다.

본문에서 다음과 같이 테스트 할 수 있습니다.

 f[x_, y_] := Flatten[Take[x, All, y]]; 
 ListLogLogPlot[
     Reverse[f[{##}, -1]], 
     AxesLabel -> {"Log (Rank)", "Log Counter"}, 
     PlotLabel -> "Testing Zipf's Law"]
 & @@
 Take[
  SortBy[
    Tally[
       StringSplit[ToLowerCase[b], RegularExpression["\\W+"]]
    ], 
   Last],
 -1000]

결과는 (아주 잘 선형입니다)

대체 텍스트 http://i46.tinypic.com/33fcmdk.jpg

편집 6> (242 문자)

정규식 리팩토링 (더 이상 선택 기능 없음)
1 개의 문자를 삭제함
기능 "f"에 대한보다 효율적인 정의

f = Flatten[Take[#1, All, #2]]&; 
BarChart[
     f[{##}, -1], 
     BarOrigin -> Left, 
     ChartLabels -> Placed[f[{##}, 1], After], 
     Axes -> None] 
& @@
  Take[
    SortBy[
       Tally[
         StringSplit[ToLowerCase[Import[i]], 
          RegularExpression["(\\W|\\b(.|the|and|of|to|i[tns]|or)\\b)+"]]
       ],
    Last],
  -22]

7 → 199 자 편집

BarChart[#2, BarOrigin->Left, ChartLabels->Placed[#1, After], Axes->None]&@@ 
  Transpose@Take[SortBy[Tally@StringSplit[ToLowerCase@Import@i, 
    RegularExpression@"(\\W|\\b(.|the|and|of|to|i[tns]|or)\\b)+"],Last], -22]

및 ( / ) 인수 로 대체 f되었습니다 .TransposeSlot#1#2
우리는 괄호가 필요하지 않습니다 ( 가능한 경우 f@x대신 사용 f[x]하십시오)

C 번호 - 510 451 436 446 434 426 개 422 문자 (축소 된)

그렇게 짧지는 않지만 지금은 아마도 정확합니다! 이전 버전에는 막대의 첫 번째 줄이 표시되지 않았고 막대의 크기를 올바르게 조정하지 않았으며 stdin에서 가져 오는 대신 파일을 다운로드했으며 필요한 모든 C # 상세 정보가 포함되지 않았습니다. C #에 추가 쓰레기가 많이 필요하지 않으면 많은 스트로크를 쉽게 면도 할 수 있습니다. Powershell이 더 잘할 수 있습니다.

using C=System.Console;   // alias for Console
using System.Linq;  // for Split, GroupBy, Select, OrderBy, etc.

class Class // must define a class
{
    static void Main()  // must define a Main
    {
        // split into words
        var allwords = System.Text.RegularExpressions.Regex.Split(
                // convert stdin to lowercase
                C.In.ReadToEnd().ToLower(),
                // eliminate stopwords and non-letters
                @"(?:\b(?:the|and|of|to|a|i[tns]?|or)\b|\W)+")
            .GroupBy(x => x)    // group by words
            .OrderBy(x => -x.Count()) // sort descending by count
            .Take(22);   // take first 22 words

        // compute length of longest bar + word
        var lendivisor = allwords.Max(y => y.Count() / (76.0 - y.Key.Length));

        // prepare text to print
        var toPrint = allwords.Select(x=> 
            new { 
                // remember bar pseudographics (will be used in two places)
                Bar = new string('_',(int)(x.Count()/lendivisor)), 
                Word=x.Key 
            })
            .ToList();  // convert to list so we can index into it

        // print top of first bar
        C.WriteLine(" " + toPrint[0].Bar);
        toPrint.ForEach(x =>  // for each word, print its bar and the word
            C.WriteLine("|" + x.Bar + "| " + x.Word));
    }
}

렌디 바이저가 422 자인 문자 는 아래 형식 (선택 공간에 사용되는 줄 바꿈)으로 인라인되어 (22 배 느려짐)

using System.Linq;using C=System.Console;class M{static void Main(){var
a=System.Text.RegularExpressions.Regex.Split(C.In.ReadToEnd().ToLower(),@"(?:\b(?:the|and|of|to|a|i[tns]?|or)\b|\W)+").GroupBy(x=>x).OrderBy(x=>-x.Count()).Take(22);var
b=a.Select(x=>new{p=new string('_',(int)(x.Count()/a.Max(y=>y.Count()/(76d-y.Key.Length)))),t=x.Key}).ToList();C.WriteLine(" "+b[0].p);b.ForEach(x=>C.WriteLine("|"+x.p+"| "+x.t));}}

펄 237 개 229 209 문자

(루비 버전을 더 더러운 골프 트릭으로 대체 split/[^a-z/,lc하고로 교체 하고 lc=~/[a-z]+/g다른 장소에서 빈 문자열에 대한 검사를 제거하도록 다시 업데이트되었습니다 . 루비 버전에서 영감을 얻었으므로 크레딧이 필요한 곳입니다.)

업데이트 : 이제 Perl 5.10으로! 교체 print로 say, 사용이 ~~을 피하기 위해 map. 이것은 명령 행에서로 호출해야합니다 perl -E '<one-liner>' alice.txt. 전체 스크립트가 한 줄에 있기 때문에 한 줄짜리로 작성하면 어려움이 없어야합니다. :)

 @s=qw/the and of to a i it in or is/;$c{$_}++foreach grep{!($_~~@s)}map{lc=~/[a-z]+/g}<>;@s=sort{$c{$b}<=>$c{$a}}keys%c;$f=76-length$s[0];say" "."_"x$f;say"|"."_"x($c{$_}/$c{$s[0]}*$f)."| $_ "foreach@s[0..21];

이 버전은 경우에 따라 정규화됩니다. ,lc케이싱을 제거 하려면 A-Z분리 정규식 에 추가 해야하므로 세척이 필요 하므로 솔루션이 짧아지지 않습니다 .

줄 바꿈이 한 문자이고 두 문자가 아닌 시스템 인 경우 대신에 문자 그대로의 줄 바꿈을 사용하여이 문자를 다른 두 문자로 줄일 수 있습니다 \n. 그러나 위의 샘플은 그 방법으로 "명확한"(ha!)이기 때문에 그렇게 작성하지 않았습니다.

다음은 대부분 정확하지만 원격으로 충분하지 않은 perl 솔루션입니다.

use strict;
use warnings;

my %short = map { $_ => 1 } qw/the and of to a i it in or is/;
my %count = ();

$count{$_}++ foreach grep { $_ && !$short{$_} } map { split /[^a-zA-Z]/ } (<>);
my @sorted = (sort { $count{$b} <=> $count{$a} } keys %count)[0..21];
my $widest = 76 - (length $sorted[0]);

print " " . ("_" x $widest) . "\n";
foreach (@sorted)
{
    my $width = int(($count{$_} / $count{$sorted[0]}) * $widest);
    print "|" . ("_" x $width) . "| $_ \n";
}

다음은 비교적 읽기 쉬운 상태에서 얻을 수있는 한 짧습니다. (392 자).

%short = map { $_ => 1 } qw/the and of to a i it in or is/;
%count;

$count{$_}++ foreach grep { $_ && !$short{$_} } map { split /[^a-z]/, lc } (<>);
@sorted = (sort { $count{$b} <=> $count{$a} } keys %count)[0..21];
$widest = 76 - (length $sorted[0]);

print " " . "_" x $widest . "\n";
print"|" . "_" x int(($count{$_} / $count{$sorted[0]}) * $widest) . "| $_ \n" foreach @sorted;

Windows PowerShell, 199 자

$x=$input-split'\P{L}'-notmatch'^(the|and|of|to|.?|i[tns]|or)$'|group|sort *
filter f($w){' '+'_'*$w
$x[-1..-22]|%{"|$('_'*($w*$_.Count/$x[-1].Count))| "+$_.Name}}
f(76..1|?{!((f $_)-match'.'*80)})[0]

(마지막 줄 바꿈은 필요하지 않지만 가독성을 위해 여기에 포함되었습니다.)

(현재 SVN 저장소에서 현재 코드와 테스트 파일을 사용할 수 있습니다 . 테스트 케이스가 가장 일반적인 오류 (막대 길이, 정규식 일치 문제 및 기타 몇 가지)를 포착하기를 바랍니다)

가정 :

입력으로서 US ASCII. 아마도 유니 코드에는 이상 할 것입니다.
텍스트 에서 최소 2 개의 논스톱 단어

역사

편안한 버전 (137), 그것은 지금까지 별도로 계산되었으므로 분명히 :

($x=$input-split'\P{L}'-notmatch'^(the|and|of|to|.?|i[tns]|or)$'|group|sort *)[-1..-22]|%{"|$('_'*(76*$_.Count/$x[-1].Count))| "+$_.Name}

첫 번째 막대를 닫지 않습니다
첫 단어가 아닌 단어의 단어 길이를 고려하지 않음

다른 솔루션과 비교하여 한 문자의 막대 길이의 변형은 부동 소수점 숫자를 정수로 변환 할 때 잘림 대신 반올림을 사용하는 PowerShell 때문입니다. 이 작업은 비례 막대 길이 만 필요했기 때문에 괜찮습니다.

다른 솔루션과 비교하여 줄 길이가 80자를 넘지 않는 최대 길이를 시도하여 가장 긴 막대 길이를 결정하는 데 약간 다른 접근 방식을 취했습니다.

이전 버전의 설명은 여기 에서 찾을 수 있습니다 .

루비, 215, 216 , 218 , 221 , 224 , 236 , 237 자

업데이트 1 : Hurray ! JS Bangs 의 솔루션 과 연계되어 있습니다 . 더 이상 잘라내는 방법을 생각할 수 없습니다 :)

업데이트 2 : 더러운 골프 트릭을 연주했습니다. 1자를 저장 each하도록 변경됨 map:)

업데이트 3 : +2로 변경 File.read되었습니다 IO.read. Array.group_by그다지 유익하지 않아 reduce+6으로 변경되었습니다 . downcase정규식 +1에서로 케이싱을 사용한 후에는 대소 문자를 구분하지 않는 점검이 필요하지 않습니다 . 값 +6을 무시하면 내림차순으로 쉽게 정렬 할 수 있습니다. 총 절감량 +15

업데이트 4 : +3이 [0]아닌 .first(@ Shtééf)

업데이트 5 : 변수를 l내부 확장 , +1 변수 s를 내부에서 +2로 확장합니다 . (@ Shtééf)

업데이트 6 : 첫 번째 줄 +2에 보간 대신 문자열 추가를 사용하십시오. (@ Shtééf)

w=(IO.read($_).downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).reduce(Hash.new 0){|m,o|m[o]+=1;m}.sort_by{|k,v|-v}.take 22;m=76-w[0][0].size;puts' '+'_'*m;w.map{|x,f|puts"|#{'_'*(f*1.0/w[0][1]*m)}| #{x} "}

업데이트 7 : 인스턴스 변수를 사용하여 루프 내부 의 첫 번째 반복을 감지하기 위해 많은 후 프라를 살펴 보았습니다 . 내가 가진 것은 +1이지만 아마도 가능성이 있습니다. 이전 버전을 보존하는 것은 이것이 흑 마법이라고 믿기 때문입니다. (@ Shtééf)

(IO.read($_).downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).reduce(Hash.new 0){|m,o|m[o]+=1;m}.sort_by{|k,v|-v}.take(22).map{|x,f|@f||(@f=f;puts' '+'_'*(@m=76-x.size));puts"|#{'_'*(f*1.0/@f*@m)}| #{x} "}

읽을 수있는 버전

string = File.read($_).downcase

words = string.scan(/[a-z]+/i)
allowed_words = words - %w{the and of to a i it in or is}
sorted_words = allowed_words.group_by{ |x| x }.map{ |x,y| [x, y.size] }.sort{ |a,b| b[1] <=> a[1] }.take(22)
highest_frequency = sorted_words.first
highest_frequency_count = highest_frequency[1]
highest_frequency_word = highest_frequency[0]

word_length = highest_frequency_word.size
widest = 76 - word_length

puts " #{'_' * widest}"    
sorted_words.each do |word, freq|
  width = (freq * 1.0 / highest_frequency_count) * widest
  puts "|#{'_' * width}| #{word} "
end

쓰다:

echo "Alice.txt" | ruby -ln GolfedWordFrequencies.rb

산출:

 _________________________________________________________________________
|_________________________________________________________________________| she 
|_______________________________________________________________| you 
|____________________________________________________________| said 
|_____________________________________________________| alice 
|_______________________________________________| was 
|___________________________________________| that 
|____________________________________| as 
|________________________________| her 
|_____________________________| with 
|_____________________________| at 
|____________________________| s 
|____________________________| t 
|__________________________| on 
|__________________________| all 
|_______________________| this 
|_______________________| for 
|_______________________| had 
|_______________________| but 
|______________________| be 
|_____________________| not 
|____________________| they 
|____________________| so

파이썬 2.x, 위 도적 접근 = 227 183 자

import sys,re
t=re.split('\W+',sys.stdin.read().lower())
r=sorted((-t.count(w),w)for w in set(t)if w not in'andithetoforinis')[:22]
for l,w in r:print(78-len(r[0][1]))*l/r[0][0]*'=',w

구현에서 자유를 허용하기 위해 제외 ( the, and, of, to, a, i, it, in, or, is)에 요청 된 모든 단어를 포함하는 문자열 연결을 구성 했습니다. 또한 두 개의 악명 높은 "단어" s와 t예제에서 제외했습니다 an, for, he. 그리고에 대한 제외를 무료로 던졌습니다 . 나는 그 단어들이 앨리스, 제임스 제임스 성서, 전문 용어 파일의 말뭉치에 대해 모든 연결을 시도하여 문자열에 의해 잘못 배제 될 단어가 있는지 알아 보았다. : 그리고 내가이 제외 문자열을 종료하는 방법입니다 itheandtoforinis및 andithetoforinis.

추신. 코드를 줄이기 위해 다른 솔루션에서 빌 렸습니다.

=========================================================================== she 
================================================================= you
============================================================== said
====================================================== alice
================================================ was
============================================ that
===================================== as
================================= her
============================== at
============================== with
=========================== on
=========================== all
======================== this
======================== had
======================= but
====================== be
====================== not
===================== they
==================== so
=================== very
=================== what
================= little

란트

무시할 단어에 대해서는 영어로 가장 많이 사용되는 단어 목록에서 가져온 단어라고 생각합니다. 이 목록은 사용 된 텍스트 모음 에 따라 다릅니다 . 가장 인기있는 목록 중 하나 ( http://en.wikipedia.org/wiki/Most_common_words_in_English , http://www.english-for-students.com/Frequently-Used-Words.html , http : // www. sporcle.com/games/common_english_words.php ) 상위 10 개 단어는 다음과 같습니다.the be(am/are/is/was/were) to of and a in that have I

이상한 나라의 앨리스 텍스트에서 the and to a of it she i you said
상위 10 개 단어는 전문 용어 파일 (v4.4.7)에서 상위 10 개 단어는the a of to and in is that or for

따라서 질문은 or문제의 무시 목록에 포함 된 이유입니다. 이 단어 that는 가장 많이 사용되는 단어 (8 번째로 많이 사용되지 않은) 에서 ~ 30 번째입니다 . 따라서 무시 목록이 동적으로 제공되거나 생략 될 수 있다고 생각합니다.

대안 아이디어는 단순히 결과에서 상위 10 단어를 건너 뛰는 것입니다. 실제로 솔루션을 단축 할 수 있습니다 (초등-11 ~ 32 번째 항목 만 표시해야 함).

파이썬 2.x에서, 격식을 차리는 방법 = 277 개 243 문자

위의 코드에서 그린 차트는 바에 한 문자 만 사용하여 단순화되었습니다. 문제 설명 (필요하지 않은)에서 차트를 정확하게 재현하려면이 코드가 수행합니다.

import sys,re
t=re.split('\W+',sys.stdin.read().lower())
r=sorted((-t.count(w),w)for w in set(t)-set(sys.argv))[:22]
h=min(9*l/(77-len(w))for l,w in r)
print'',9*r[0][0]/h*'_'
for l,w in r:print'|'+9*l/h*'_'+'|',w

나는 제외 the, and, of, to, a, i, it, in, or, is할 10 단어를 다소 무작위로 선택하는 데 문제가있어서 다음과 같이 명령 줄 매개 변수로 전달됩니다.
python WordFrequencyChart.py the and of to a i it in or is <"Alice's Adventures in Wonderland.txt"

명령 줄에 전달 된 "원본"무시 목록을 설명하면 213 자 + 30입니다. = 243

추신. 두 번째 코드는 또한 모든 최상위 단어의 길이에 대해 "조정"을 수행하므로 그 중 어느 것도 퇴화되지 않은 경우 오버플로되지 않습니다.

 _______________________________________________________________
|_______________________________________________________________| she
|_______________________________________________________| superlongstringstring
|_____________________________________________________| said
|______________________________________________| alice
|_________________________________________| was
|______________________________________| that
|_______________________________| as
|____________________________| her
|__________________________| at
|__________________________| with
|_________________________| s
|_________________________| t
|_______________________| on
|_______________________| all
|____________________| this
|____________________| for
|____________________| had
|____________________| but
|___________________| be
|___________________| not
|_________________| they
|_________________| so

하스켈 - 366 351 344 337 333 자

( main가독성을 위해 한 줄 바꿈이 추가되었으며 마지막 줄 끝에서 줄 바꿈이 필요하지 않습니다.)

import Data.List
import Data.Char
l=length
t=filter
m=map
f c|isAlpha c=toLower c|0<1=' '
h w=(-l w,head w)
x!(q,w)='|':replicate(minimum$m(q?)x)'_'++"| "++w
q?(g,w)=q*(77-l w)`div`g
b x=m(x!)x
a(l:r)=(' ':t(=='_')l):l:r
main=interact$unlines.a.b.take 22.sort.m h.group.sort
  .t(`notElem`words"the and of to a i it in or is").words.m f

작동 방식은 interact거꾸로 인수를 읽으면 가장 잘 나타납니다 .

map f 알파벳을 소문자로, 다른 모든 것을 공백으로 바꿉니다.
words 분리 된 공백을 제거하여 단어 목록을 생성합니다.
filter (notElem words "the and of to a i it in or is")은 금지 된 단어가있는 모든 항목을 버립니다.
group . sort 단어를 정렬하고 동일한 단어를 목록으로 그룹화합니다.
map h동일한 단어의 각 목록을 양식의 튜플에 매핑합니다 (-frequency, word).
take 22 . sort 내림차순 (첫 번째 튜플 항목)으로 튜플을 정렬하고 처음 22 개의 튜플 만 유지합니다.
b 튜플을 막대에 매핑합니다 (아래 참조).
a 맨 밑줄의 첫 줄을 추가하여 맨 위 막대를 완성하십시오.
unlines 이 모든 행을 개행과 결합합니다.

까다로운 비트는 바 길이를 올바르게 얻는 것입니다. 나는 막대의 길이를 향해 밑줄 만 세는 것으로 가정하므로 ||길이가 0 인 막대가 될 것입니다. 이 함수는에 b매핑 c x됩니다 x. 여기서 x히스토그램 목록이 표시됩니다. 전체 목록은에 전달 c되므로 호출 c할 때마다 호출 하여 자체적으로 배율을 계산할 수 있습니다 u. 이런 식으로 변환 함수와 가져 오기에서 많은 문자를 먹는 부동 소수점 수학 또는 이성적 표현을 사용하지 않습니다.

를 사용하는 비결에 유의하십시오 -frequency. 이것은 할 필요 제거 정렬 (오름차순) 이후를 먼저 가장 큰 주파수 장소 말 것이다. 나중에 함수 에서 두 개의 값이 곱 해져 부정이 취소됩니다.reversesort-frequencyu-frequency

자바 스크립트 1.8 (SpiderMonkey)-354

x={};p='|';e=' ';z=[];c=77
while(l=readline())l.toLowerCase().replace(/\b(?!(the|and|of|to|a|i[tns]?|or)\b)\w+/g,function(y)x[y]?x[y].c++:z.push(x[y]={w:y,c:1}))
z=z.sort(function(a,b)b.c-a.c).slice(0,22)
for each(v in z){v.r=v.c/z[0].c
c=c>(l=(77-v.w.length)/v.r)?l:c}for(k in z){v=z[k]
s=Array(v.r*c|0).join('_')
if(!+k)print(e+s+e)
print(p+s+p+e+v.w)}

슬프게도 for([k,v]in z)Rhino 버전은 SpiderMonkey에서 작동하기를 원하지 않는 것 같고 사용하는 것 readFile()보다 약간 쉽지만 readline()1.8까지 이동하면 함수 클로저를 사용하여 몇 줄을 더 줄일 수 있습니다 ....

가독성을위한 공백 추가 :

x={};p='|';e=' ';z=[];c=77
while(l=readline())
  l.toLowerCase().replace(/\b(?!(the|and|of|to|a|i[tns]?|or)\b)\w+/g,
   function(y) x[y] ? x[y].c++ : z.push( x[y] = {w: y, c: 1} )
  )
z=z.sort(function(a,b) b.c - a.c).slice(0,22)
for each(v in z){
  v.r=v.c/z[0].c
  c=c>(l=(77-v.w.length)/v.r)?l:c
}
for(k in z){
  v=z[k]
  s=Array(v.r*c|0).join('_')
  if(!+k)print(e+s+e)
  print(p+s+p+e+v.w)
}

용법: js golf.js < input.txt

산출:

 _________________________________________________________________________ 
| _________________________________________________________________________ | 여자
| _______________________________________________________________ | 당신
| ____________________________________________________________ | 말했다
| ____________________________________________________ | 앨리스
| ______________________________________________ | 였다
| ___________________________________________ | 그
| ___________________________________ | 같이
| ________________________________ | 그녀
| _____________________________ | ...에서
| _____________________________ | 와
| ____________________________ | 에스
| ____________________________ | 티
| __________________________ | 의 위에
| _________________________ | 모두
| _______________________ | 이
| ______________________ | ...에 대한
| ______________________ | 했다
| ______________________ | 그러나
| _____________________ | 있다
| _____________________ | 아니
| ___________________ | 그들
| ___________________ | 그래서

(기본 버전-막대 너비를 올바르게 처리하지 못함)

자바 스크립트 (Rhino)- 405 395 387 377 368 343 304 자

~~정렬 논리가 꺼져 있다고 생각하지만 ..~~ Brainfart가 수정되었습니다.

축소됨 ( 가끔 남용으로 \n해석되는 ;경우) :

x={};p='|';e=' ';z=[]
readFile(arguments[0]).toLowerCase().replace(/\b(?!(the|and|of|to|a|i[tns]?|or)\b)\w+/g,function(y){x[y]?x[y].c++:z.push(x[y]={w:y,c:1})})
z=z.sort(function(a,b){return b.c-a.c}).slice(0,22)
for([k,v]in z){s=Array((v.c/z[0].c)*70|0).join('_')
if(!+k)print(e+s+e)
print(p+s+p+e+v.w)}

PHP CLI 버전 (450 자)

이 솔루션은 대부분의 순수 주의자들이 설득력있게 무시하기로 선택한 마지막 요구 사항을 고려합니다. 170 문자가 들었습니다!

용법: php.exe <this.php> <file.txt>

축소 :

<?php $a=array_count_values(array_filter(preg_split('/[^a-z]/',strtolower(file_get_contents($argv[1])),-1,1),function($x){return !preg_match("/^(.|the|and|of|to|it|in|or|is)$/",$x);}));arsort($a);$a=array_slice($a,0,22);function R($a,$F,$B){$r=array();foreach($a as$x=>$f){$l=strlen($x);$r[$x]=$b=$f*$B/$F;if($l+$b>76)return R($a,$f,76-$l);}return$r;}$c=R($a,max($a),76-strlen(key($a)));foreach($a as$x=>$f)echo '|',str_repeat('-',$c[$x]),"| $x\n";?>

인간이 읽을 수있는 내용 :

<?php

// Read:
$s = strtolower(file_get_contents($argv[1]));

// Split:
$a = preg_split('/[^a-z]/', $s, -1, PREG_SPLIT_NO_EMPTY);

// Remove unwanted words:
$a = array_filter($a, function($x){
       return !preg_match("/^(.|the|and|of|to|it|in|or|is)$/",$x);
     });

// Count:
$a = array_count_values($a);

// Sort:
arsort($a);

// Pick top 22:
$a=array_slice($a,0,22);


// Recursive function to adjust bar widths
// according to the last requirement:
function R($a,$F,$B){
    $r = array();
    foreach($a as $x=>$f){
        $l = strlen($x);
        $r[$x] = $b = $f * $B / $F;
        if ( $l + $b > 76 )
            return R($a,$f,76-$l);
    }
    return $r;
}

// Apply the function:
$c = R($a,max($a),76-strlen(key($a)));


// Output:
foreach ($a as $x => $f)
    echo '|',str_repeat('-',$c[$x]),"| $x\n";

?>

산출:

|-------------------------------------------------------------------------| she
|---------------------------------------------------------------| you
|------------------------------------------------------------| said
|-----------------------------------------------------| alice
|-----------------------------------------------| was
|-------------------------------------------| that
|------------------------------------| as
|--------------------------------| her
|-----------------------------| at
|-----------------------------| with
|--------------------------| on
|--------------------------| all
|-----------------------| this
|-----------------------| for
|-----------------------| had
|-----------------------| but
|----------------------| be
|---------------------| not
|--------------------| they
|--------------------| so
|-------------------| very
|------------------| what

긴 단어가 있으면 막대가 올바르게 조정됩니다.

|--------------------------------------------------------| she
|---------------------------------------------------| thisisareallylongwordhere
|-------------------------------------------------| you
|-----------------------------------------------| said
|-----------------------------------------| alice
|------------------------------------| was
|---------------------------------| that
|---------------------------| as
|-------------------------| her
|-----------------------| with
|-----------------------| at
|--------------------| on
|--------------------| all
|------------------| this
|------------------| for
|------------------| had
|-----------------| but
|-----------------| be
|----------------| not
|---------------| they
|---------------| so
|--------------| very

파이썬 3.1 - (245) 229 부르는 것들은

나는 Counter를 사용하는 것이 일종의 부정 행위 라고 생각합니다 :) 나는 일주일 전에 그것에 대해 읽었으므로 이것이 어떻게 작동하는지 볼 수있는 완벽한 기회였습니다.

import re,collections
o=collections.Counter([w for w in re.findall("[a-z]+",open("!").read().lower())if w not in"a and i in is it of or the to".split()]).most_common(22)
print('\n'.join('|'+76*v//o[0][1]*'_'+'| '+k for k,v in o))

인쇄합니다 :

|____________________________________________________________________________| she
|__________________________________________________________________| you
|_______________________________________________________________| said
|_______________________________________________________| alice
|_________________________________________________| was
|_____________________________________________| that
|_____________________________________| as
|__________________________________| her
|_______________________________| with
|_______________________________| at
|______________________________| s
|_____________________________| t
|____________________________| on
|___________________________| all
|________________________| this
|________________________| for
|________________________| had
|________________________| but
|______________________| be
|______________________| not
|_____________________| they
|____________________| so

코드 중 일부는 AKX 솔루션에서 "빌려온"것입니다.

펄, 205 개 191 개 189 문자 / 205 자 (완전히 구현)

일부 부분은 초기 펄 / 루비 제출에서 영감을 얻었으며, 비슷한 아이디어가 독립적으로 도착했으며 다른 부분은 독창적이었습니다. 더 짧은 버전에는 다른 제출물에서 보았거나 배운 것들도 포함되어 있습니다.

기발한:

$k{$_}++for grep{$_!~/^(the|and|of|to|a|i|it|in|or|is)$/}map{lc=~/[a-z]+/g}<>;@t=sort{$k{$b}<=>$k{$a}}keys%k;$l=76-length$t[0];printf" %s
",'_'x$l;printf"|%s| $_
",'_'x int$k{$_}/$k{$t[0]}*$l for@t[0..21];

191 자 이하의 ~~최신 버전~~ :

/^(the|and|of|to|.|i[tns]|or)$/||$k{$_}++for map{lc=~/[a-z]+/g}<>;@e=sort{$k{$b}<=>$k{$a}}keys%k;$n=" %s
";$r=(76-y///c)/$k{$_=$e[0]};map{printf$n,'_'x($k{$_}*$r),$_;$n="|%s| %s
"}@e[0,0..21]

189 자 이하의 최신 버전 :

/^(the|and|of|to|.|i[tns]|or)$/||$k{$_}++for map{lc=~/[a-z]+/g}<>;@_=sort{$k{$b}<=>$k{$a}}keys%k;$n=" %s
";$r=(76-m//)/$k{$_=$_[0]};map{printf$n,'_'x($k{$_}*$r),$_;$n="|%s| %s
"}@_[0,0..21]

이 버전 (205 자)은 나중에 찾을 수있는 것보다 긴 단어가있는 줄을 설명합니다.

/^(the|and|of|to|.|i[tns]|or)$/||$k{$_}++for map{lc=~/[a-z]+/g}<>;($r)=sort{$a<=>$b}map{(76-y///c)/$k{$_}}@e=sort{$k{$b}<=>$k{$a}}keys%k;$n=" %s
";map{printf$n,'_'x($k{$_}*$r),$_;$n="|%s| %s
";}@e[0,0..21]

펄 : 203 202 201 198 195 208 231분의 203 문자

$/=\0;/^(the|and|of|to|.|i[tns]|or)$/i||$x{lc$_}++for<>=~/[a-z]+/gi;map{$z=$x{$_};$y||{$y=(76-y///c)/$z}&&warn" "."_"x($z*$y)."\n";printf"|%.78s\n","_"x($z*$y)."| $_"}(sort{$x{$b}<=>$x{$a}}keys%x)[0..21]

2 차 단어가 80 자 이상으로 결합 할만 큼 길고 병리학 적 인 병리학 적 사례에 대한 지시 된 행동 (글로벌 바 퀴칭)을 포함한 대안의 완전한 구현 ( 이 구현은 231 자 ) :

$/=\0;/^(the|and|of|to|.|i[tns]|or)$/i||$x{lc$_}++for<>=~/[a-z]+/gi;@e=(sort{$x{$b}<=>$x{$a}}keys%x)[0..21];for(@e){$p=(76-y///c)/$x{$_};($y&&$p>$y)||($y=$p)}warn" "."_"x($x{$e[0]}*$y)."\n";for(@e){warn"|"."_"x($x{$_}*$y)."| $_\n"}

사양은 이것이 STDOUT으로 가야한다고 아무데도 언급하지 않았으므로 인쇄 대신 perl의 warn ()을 사용했습니다 .4 문자가 저장되었습니다. foreach 대신 map을 사용했지만 split (join ())에서 더 많은 비용을 절약 할 수 있다고 생각합니다. 아직도, 그것을 203으로 줄였습니다-잠들 수 있습니다. 적어도 Perl은 "shell, grep, tr, grep, sort, uniq, sort, head, perl"문자 수에 따라 현재는;)

PS : Reddit는 "Hi"라고 말합니다.)

업데이트 : 할당 및 암시 스칼라 변환 조인을 위해 join ()을 제거했습니다. 또한 202 자로 줄였습니다. 또한 선택적인 "1 글자 단어 무시"규칙을 사용하여 2자를 깎았으므로 빈도수가이를 반영한다는 점을 명심하십시오.

업데이트 2 : 처음으로 <>를 사용하여 파일을 한 번에 가져 오기 위해 $ /를 죽이는 할당과 암시 적 조인을 제거했습니다. 같은 크기이지만 나 스티어입니다. $ y || {} &&에 대해 if (! $ y) {}를 교환하여 char => 201을 1 개 더 절약했습니다.

업데이트 3 : 맵 블록 밖으로 lc를 이동하여 소문자를 조기에 제어 (lc <>)했습니다. 더 이상 필요하지 않은 경우 두 정규식을 모두 더 이상 / i 옵션을 사용하지 않도록 스왑했습니다. 전통적인 perlgolf에 대한 명시 적 조건부 x? y : z 구문을 교체했습니다. || 암시 적 조건부 구문-/^...$/i?1:$x { $ } ++ for /^...$/||$x{$ } ++ 세 문자를 저장했습니다! => 198, 200 장벽을 무너 뜨 렸습니다. 곧 잠을 잘지도 모른다. 아마도.

업데이트 4 : 수면 부족으로 인해 제 정신이 나빠졌습니다. 잘. 더 미쳤어. 이것이 정상적인 해피 텍스트 파일 만 구문 분석해야한다는 것을 알기 때문에 null에 도달하면 포기했습니다. 두 문자를 저장했습니다. "길이"를 1 문자로 짧게 (그리고 훨씬 더 골프처럼) y /// c로 대체했습니다. 내가 당신을 위해 간다! 흐느낌

업데이트 5 : Sleep Dep으로 인해 22 행 제한과 후속 라인 제한을 잊어 버렸습니다. 처리 된 것들로 208까지 백업하십시오. 나쁘지 않은데, 그것을 처리하기 위해 13 문자는 세상의 끝이 아닙니다. 펄의 정규식 인라인 평가로 놀았 지 만 작업 과 문자 저장에 어려움이 있습니다 ... lol. 현재 출력과 일치하도록 예제를 업데이트했습니다.

업데이트 6 : 구문 사탕 ++가 그것을 위해 행복하게 밀어 넣을 수 있기 때문에 불필요한 (...)을 보호하는 불필요한 괄호가 제거되었습니다. Chas의 입력 덕분입니다. Owens (내 피곤한 두뇌를 생각 나게 함)에는 캐릭터 클래스 i [tns] 솔루션이 있습니다. 203으로 돌아갑니다.

업데이트 7 : 두 번째 작업, 전체 사양 구현 (병리학 사례가없는 원래 사양을 기준으로 대부분의 사람들이 수행하는 잘림 대신 2 차 긴 단어에 대한 전체 막대 분할 동작 포함) 추가

예 :

 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|___________________________________________| that
|____________________________________| as
|________________________________| her
|_____________________________| with
|_____________________________| at
|__________________________| on
|__________________________| all
|_______________________| this
|_______________________| for
|_______________________| had
|_______________________| but
|______________________| be
|_____________________| not
|____________________| they
|____________________| so
|___________________| very
|__________________| what

병리학 적 사례의 대체 구현 :

 _______________________________________________________________
|_______________________________________________________________| she
|_______________________________________________________| superlongstringstring
|____________________________________________________| said
|______________________________________________| alice
|________________________________________| was
|_____________________________________| that
|_______________________________| as
|____________________________| her
|_________________________| with
|_________________________| at
|_______________________| on
|______________________| all
|____________________| this
|____________________| for
|____________________| had
|____________________| but
|___________________| be
|__________________| not
|_________________| they
|_________________| so
|________________| very
|________________| what

F #, 452 자

Strightforward : 일련 a의 단어 개수 쌍을 구하고 열당 가장 많은 단어 개수 승수 k를 찾은 다음 결과를 인쇄하십시오.

let a=
 stdin.ReadToEnd().Split(" .?!,\":;'\r\n".ToCharArray(),enum 1)
 |>Seq.map(fun s->s.ToLower())|>Seq.countBy id
 |>Seq.filter(fun(w,n)->not(set["the";"and";"of";"to";"a";"i";"it";"in";"or";"is"].Contains w))
 |>Seq.sortBy(fun(w,n)-> -n)|>Seq.take 22
let k=a|>Seq.map(fun(w,n)->float(78-w.Length)/float n)|>Seq.min
let u n=String.replicate(int(float(n)*k)-2)"_"
printfn" %s "(u(snd(Seq.nth 0 a)))
for(w,n)in a do printfn"|%s| %s "(u n)w

예 (당신과 다른 주파수 수를 가지고 있습니다, 이유가 확실하지 않습니다) :

% app.exe < Alice.txt

 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|_____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|___________________________________________| that
|___________________________________| as
|________________________________| her
|_____________________________| with
|_____________________________| at
|____________________________| t
|____________________________| s
|__________________________| on
|_________________________| all
|_______________________| this
|______________________| had
|______________________| for
|_____________________| but
|_____________________| be
|____________________| not
|___________________| they
|__________________| so

파이썬 2.6, 347 자

import re
W,x={},"a and i in is it of or the to".split()
[W.__setitem__(w,W.get(w,0)-1)for w in re.findall("[a-z]+",file("11.txt").read().lower())if w not in x]
W=sorted(W.items(),key=lambda p:p[1])[:22]
bm=(76.-len(W[0][0]))/W[0][1]
U=lambda n:"_"*int(n*bm)
print "".join(("%s\n|%s| %s "%((""if i else" "+U(n)),U(n),w))for i,(w,n)in enumerate(W))

산출:

 _________________________________________________________________________
|_________________________________________________________________________| she 
|_______________________________________________________________| you 
|____________________________________________________________| said 
|_____________________________________________________| alice 
|_______________________________________________| was 
|___________________________________________| that 
|____________________________________| as 
|________________________________| her 
|_____________________________| with 
|_____________________________| at 
|____________________________| s 
|____________________________| t 
|__________________________| on 
|__________________________| all 
|_______________________| this 
|_______________________| for 
|_______________________| had 
|_______________________| but 
|______________________| be 
|_____________________| not 
|____________________| they 
|____________________| so

* sh (+ 컬), 부분 솔루션

이것은 불완전하지만, 지옥의 경우, 문제의 절반을 192 바이트로 세는 단어 빈도는 다음과 같습니다.

curl -s http://www.gutenberg.org/files/11/11.txt|sed -e 's@[^a-z]@\n@gi'|tr '[:upper:]' '[:lower:]'|egrep -v '(^[^a-z]*$|\b(the|and|of|to|a|i|it|in|or|is)\b)' |sort|uniq -c|sort -n|tail -n 22

Gawk-336 (원래 507) 자

(출력 형식을 수정 한 후; 수축 사물을 수정하십시오. 더 많이; 몇 가지를 저장할 다른 곳을 찾았지만 바 길이 버그를 해결하기 위해 두 개를 돌려주었습니다.)

ㅎㅎ [Matt 's JavaScript] [1] 솔루션 ^{카운터 챌린지에} 앞서 있습니다 ^.^;)및 [AKX 's python] [2].

문제는 네이티브 연관 배열을 구현하는 언어를 요구하는 것 같습니다. 물론 나는 끔찍하게 부족한 연산자 세트가 있는 언어를 선택했습니다. 특히 awk가 해시 맵의 요소를 제공하는 순서를 제어 할 수 없으므로 전체 맵을 반복적으로 스캔 하여 현재 가장 많은 항목을 찾아서 인쇄하고 배열에서 삭제합니다.

내가 만든 모든 골프화로 인해 매우 비효율적입니다.

축소 :

{gsub("[^a-zA-Z]"," ");for(;NF;NF--)a[tolower($NF)]++}
END{split("the and of to a i it in or is",b," ");
for(w in b)delete a[b[w]];d=1;for(w in a){e=a[w]/(78-length(w));if(e>d)d=e}
for(i=22;i;--i){e=0;for(w in a)if(a[w]>e)e=a[x=w];l=a[x]/d-2;
t=sprintf(sprintf("%%%dc",l)," ");gsub(" ","_",t);if(i==22)print" "t;
print"|"t"| "x;delete a[x]}}

명확성을 위해서만 줄 바꿈 : 필요하지 않으며 계산해서는 안됩니다.

산출:

$ gawk -f wordfreq.awk.min < 11.txt 
 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|____________________________________________________________| said
|____________________________________________________| alice
|______________________________________________| was
|__________________________________________| that
|___________________________________| as
|_______________________________| her
|____________________________| with
|____________________________| at
|___________________________| s
|___________________________| t
|_________________________| on
|_________________________| all
|______________________| this
|______________________| for
|______________________| had
|_____________________| but
|____________________| be
|____________________| not
|___________________| they
|__________________| so
$ sed 's/you/superlongstring/gI' 11.txt | gawk -f wordfreq.awk.min
 ______________________________________________________________________
|______________________________________________________________________| she
|_____________________________________________________________| superlongstring
|__________________________________________________________| said
|__________________________________________________| alice
|____________________________________________| was
|_________________________________________| that
|_________________________________| as
|______________________________| her
|___________________________| with
|___________________________| at
|__________________________| s
|__________________________| t
|________________________| on
|________________________| all
|_____________________| this
|_____________________| for
|_____________________| had
|____________________| but
|___________________| be
|___________________| not
|__________________| they
|_________________| so

읽을 수있는; 633 자 (원래 949) :

{
    gsub("[^a-zA-Z]"," ");
    for(;NF;NF--)
    a[tolower($NF)]++
}
END{
    # remove "short" words
    split("the and of to a i it in or is",b," ");
    for (w in b) 
    delete a[b[w]];
    # Find the bar ratio
    d=1;
    for (w in a) {
    e=a[w]/(78-length(w));
    if (e>d)
        d=e
    }
    # Print the entries highest count first
    for (i=22; i; --i){               
    # find the highest count
    e=0;
    for (w in a) 
        if (a[w]>e)
        e=a[x=w];
        # Print the bar
    l=a[x]/d-2;
    # make a string of "_" the right length
    t=sprintf(sprintf("%%%dc",l)," ");
    gsub(" ","_",t);
    if (i==22) print" "t;
    print"|"t"| "x;
    delete a[x]
    }
}

일반적인 LISP, 670 자

나는 LISP 초보자이며, 이것은 계산을 위해 해시 테이블을 사용하려는 시도입니다 (아마도 가장 간단한 방법은 아닙니다).

(flet((r()(let((x(read-char t nil)))(and x(char-downcase x)))))(do((c(
make-hash-table :test 'equal))(w NIL)(x(r)(r))y)((not x)(maphash(lambda
(k v)(if(not(find k '("""the""and""of""to""a""i""it""in""or""is"):test
'equal))(push(cons k v)y)))c)(setf y(sort y #'> :key #'cdr))(setf y
(subseq y 0(min(length y)22)))(let((f(apply #'min(mapcar(lambda(x)(/(-
76.0(length(car x)))(cdr x)))y))))(flet((o(n)(dotimes(i(floor(* n f)))
(write-char #\_))))(write-char #\Space)(o(cdar y))(write-char #\Newline)
(dolist(x y)(write-char #\|)(o(cdr x))(format t "| ~a~%"(car x))))))
(cond((char<= #\a x #\z)(push x w))(t(incf(gethash(concatenate 'string(
reverse w))c 0))(setf w nil)))))

예를 들어로 실행할 수 있습니다 cat alice.txt | clisp -C golf.lisp.

읽을 수있는 형태로

(flet ((r () (let ((x (read-char t nil)))
               (and x (char-downcase x)))))
  (do ((c (make-hash-table :test 'equal))  ; the word count map
       w y                                 ; current word and final word list
       (x (r) (r)))  ; iteration over all chars
       ((not x)

        ; make a list with (word . count) pairs removing stopwords
        (maphash (lambda (k v)
                   (if (not (find k '("" "the" "and" "of" "to"
                                      "a" "i" "it" "in" "or" "is")
                                  :test 'equal))
                       (push (cons k v) y)))
                 c)

        ; sort and truncate the list
        (setf y (sort y #'> :key #'cdr))
        (setf y (subseq y 0 (min (length y) 22)))

        ; find the scaling factor
        (let ((f (apply #'min
                        (mapcar (lambda (x) (/ (- 76.0 (length (car x)))
                                               (cdr x)))
                                y))))
          ; output
          (flet ((outx (n) (dotimes (i (floor (* n f))) (write-char #\_))))
             (write-char #\Space)
             (outx (cdar y))
             (write-char #\Newline)
             (dolist (x y)
               (write-char #\|)
               (outx (cdr x))
               (format t "| ~a~%" (car x))))))

       ; add alphabetic to current word, and bump word counter
       ; on non-alphabetic
       (cond
        ((char<= #\a x #\z)
         (push x w))
        (t
         (incf (gethash (concatenate 'string (reverse w)) c 0))
         (setf w nil)))))

C (828)

난독 화 된 코드처럼 보이며 문자열, 목록 및 해시에 glib를 사용합니다. 문자 수 wc -m는 828 입니다. 단일 문자 단어는 고려하지 않습니다. 막대의 최대 길이를 계산하려면 처음 22 개뿐만 아니라 모든 단어 중에서 가능한 가장 긴 단어를 고려하십시오. 이것이 스펙과의 편차입니까?

실패를 처리하지 않으며 사용한 메모리를 해제하지 않습니다.

#include <glib.h>
#define S(X)g_string_##X
#define H(X)g_hash_table_##X
GHashTable*h;int m,w=0,z=0;y(const void*a,const void*b){int*A,*B;A=H(lookup)(h,a);B=H(lookup)(h,b);return*B-*A;}void p(void*d,void*u){int *v=H(lookup)(h,d);if(w<22){g_printf("|");*v=*v*(77-z)/m;while(--*v>=0)g_printf("=");g_printf("| %s\n",d);w++;}}main(c){int*v;GList*l;GString*s=S(new)(NULL);h=H(new)(g_str_hash,g_str_equal);char*n[]={"the","and","of","to","it","in","or","is"};while((c=getchar())!=-1){if(isalpha(c))S(append_c)(s,tolower(c));else{if(s->len>1){for(c=0;c<8;c++)if(!strcmp(s->str,n[c]))goto x;if((v=H(lookup)(h,s->str))!=NULL)++*v;else{z=MAX(z,s->len);v=g_malloc(sizeof(int));*v=1;H(insert)(h,g_strdup(s->str),v);}}x:S(truncate)(s,0);}}l=g_list_sort(H(get_keys)(h),y);m=*(int*)H(lookup)(h,g_list_first(l)->data);g_list_foreach(l,p,NULL);}

펄, 185 자

~~200 (약간 깨진)~~ ~~199~~ ~~197~~ ~~195~~ ~~193~~ ~~187~~ 185 자 마지막 두 줄 바꿈이 중요합니다. 사양을 준수합니다.

map$X{+lc}+=!/^(.|the|and|to|i[nst]|o[rf])$/i,/[a-z]+/gfor<>;
$n=$n>($:=$X{$_}/(76-y+++c))?$n:$:for@w=(sort{$X{$b}-$X{$a}}%X)[0..21];
die map{$U='_'x($X{$_}/$n);" $U
"x!$z++,"|$U| $_
"}@w

첫 줄은에 유효한 단어의 수를로드합니다 %X.

두 번째 줄은 모든 출력 줄이 <= 80자가되도록 최소 배율을 계산합니다.

세 번째 줄 (두 줄 바꿈 문자 포함)이 출력을 생성합니다.

자바 - 886 865 756 744 742 744 752 742 714 680 문자

742 이전 업데이트 : 정규식 개선, 불필요한 매개 변수화 된 유형 제거, 불필요한 공백 제거.
업데이트 742> 744 문자 : 고정 길이 핵을 수정했습니다. 그것은 다른 단어 (아직)가 아닌 첫 번째 단어에만 의존합니다. 코드를 단축 할 여러 곳을 찾았습니다 ( \\s정규식에서로 대체되고 로 ArrayList대체되었습니다 Vector). 이제 Commons IO 종속성을 제거하고 stdin에서 읽는 짧은 방법을 찾고 있습니다.
업데이트 744> 752 문자 : 공통 종속성을 제거했습니다. 이제 stdin에서 읽습니다. stdin에 텍스트를 붙여 넣고 Ctrl+Z결과를 얻으려면 누르십시오 .
업데이트 752> 742 문자 : public공백을 제거 하고 클래스 이름을 2 대신 1 문자로 만들고 이제 한 글자 단어를 무시합니다.
업데이트 742> 714 문자 칼의 의견에 따라 업데이트 : 대체 제거 중복 할당 (> 730 742) m.containsKey(k)에 의해 m.get(k)!=null(730> 728), (> 714 728) 라인의 substringing 소개했다.
업데이트 714> 680 자 : Rotsor의 의견에 따라 업데이트 됨 : 불필요한 주조물을 제거하기 위해 바 크기 계산을 개선 split()하고 불필요한 제거를 개선 했습니다 replaceAll().

import java.util.*;class F{public static void main(String[]a)throws Exception{StringBuffer b=new StringBuffer();for(int c;(c=System.in.read())>0;b.append((char)c));final Map<String,Integer>m=new HashMap();for(String w:b.toString().toLowerCase().split("(\\b(.|the|and|of|to|i[tns]|or)\\b|\\W)+"))m.put(w,m.get(w)!=null?m.get(w)+1:1);List<String>l=new Vector(m.keySet());Collections.sort(l,new Comparator(){public int compare(Object l,Object r){return m.get(r)-m.get(l);}});int c=76-l.get(0).length();String s=new String(new char[c]).replace('\0','_');System.out.println(" "+s);for(String w:l.subList(0,22))System.out.println("|"+s.substring(0,m.get(w)*c/m.get(l.get(0)))+"| "+w);}}

더 읽기 쉬운 버전 :

import java.util.*;
class F{
 public static void main(String[]a)throws Exception{
  StringBuffer b=new StringBuffer();for(int c;(c=System.in.read())>0;b.append((char)c));
  final Map<String,Integer>m=new HashMap();for(String w:b.toString().toLowerCase().split("(\\b(.|the|and|of|to|i[tns]|or)\\b|\\W)+"))m.put(w,m.get(w)!=null?m.get(w)+1:1);
  List<String>l=new Vector(m.keySet());Collections.sort(l,new Comparator(){public int compare(Object l,Object r){return m.get(r)-m.get(l);}});
  int c=76-l.get(0).length();String s=new String(new char[c]).replace('\0','_');System.out.println(" "+s);
  for(String w:l.subList(0,22))System.out.println("|"+s.substring(0,m.get(w)*c/m.get(l.get(0)))+"| "+w);
 }
}

산출:

 _________________________________________________________________________
| _________________________________________________________________________ | 여자
| _______________________________________________________________ | 당신
| ____________________________________________________________ | 말했다
| _____________________________________________________ | 앨리스
| _______________________________________________ | 였다
| ___________________________________________ | 그
| ____________________________________ | 같이
| ________________________________ | 그녀
| _____________________________ | 와
| _____________________________ | ...에서
| __________________________ | 의 위에
| __________________________ | 모두
| _______________________ | 이
| _______________________ | ...에 대한
| _______________________ | 했다
| _______________________ | 그러나
| ______________________ | 있다
| _____________________ | 아니
| ____________________ | 그들
| ____________________ | 그래서
| ___________________ | 대단히
| __________________ | 뭐

그것은 꽤 자바가 발생하지 않는 것을 짜증 String#join()과 폐쇄 (아직).

Rotsor가 편집 :

솔루션을 몇 가지 변경했습니다.

문자열로 대체 된 목록 []
내 자신의 String 배열을 선언하는 대신 'args'인수를 재사용했습니다. 또한 .ToArray ()의 인수로 사용했습니다.
StringBuffer를 문자열로 대체했습니다 (예, 끔찍한 성능)
조기 정렬이 중단 된 선택 정렬로 Java 정렬을 대체했습니다 (처음 22 개의 요소 만 찾으면 됨)
일부 int 선언을 단일 명령문으로 집계
비 제한적 알고리즘을 구현하여 가장 제한적인 출력 라인을 찾습니다. FP없이 구현했습니다.
텍스트에 22 개 미만의 고유 단어가있을 때 프로그램 충돌 문제를 수정했습니다.
입력을 읽는 새로운 알고리즘을 구현했습니다.이 알고리즘은 빠르고 느리고 9 자만 깁니다.

압축 코드는 ~~688~~ ~~711~~ 684 자입니다.

import java.util.*;class F{public static void main(String[]l)throws Exception{Map<String,Integer>m=new HashMap();String w="";int i=0,k=0,j=8,x,y,g=22;for(;(j=System.in.read())>0;w+=(char)j);for(String W:w.toLowerCase().split("(\\b(.|the|and|of|to|i[tns]|or)\\b|\\W)+"))m.put(W,m.get(W)!=null?m.get(W)+1:1);l=m.keySet().toArray(l);x=l.length;if(x<g)g=x;for(;i<g;++i)for(j=i;++j<x;)if(m.get(l[i])<m.get(l[j])){w=l[i];l[i]=l[j];l[j]=w;}for(;k<g;k++){x=76-l[k].length();y=m.get(l[k]);if(k<1||y*i>x*j){i=x;j=y;}}String s=new String(new char[m.get(l[0])*i/j]).replace('\0','_');System.out.println(" "+s);for(k=0;k<g;k++){w=l[k];System.out.println("|"+s.substring(0,m.get(w)*i/j)+"| "+w);}}}

빠른 버전 ( ~~720~~ 693 자)

import java.util.*;class F{public static void main(String[]l)throws Exception{Map<String,Integer>m=new HashMap();String w="";int i=0,k=0,j=8,x,y,g=22;for(;j>0;){j=System.in.read();if(j>90)j-=32;if(j>64&j<91)w+=(char)j;else{if(!w.matches("^(|.|THE|AND|OF|TO|I[TNS]|OR)$"))m.put(w,m.get(w)!=null?m.get(w)+1:1);w="";}}l=m.keySet().toArray(l);x=l.length;if(x<g)g=x;for(;i<g;++i)for(j=i;++j<x;)if(m.get(l[i])<m.get(l[j])){w=l[i];l[i]=l[j];l[j]=w;}for(;k<g;k++){x=76-l[k].length();y=m.get(l[k]);if(k<1||y*i>x*j){i=x;j=y;}}String s=new String(new char[m.get(l[0])*i/j]).replace('\0','_');System.out.println(" "+s);for(k=0;k<g;k++){w=l[k];System.out.println("|"+s.substring(0,m.get(w)*i/j)+"| "+w);}}}

더 읽기 쉬운 버전 :

import java.util.*;class F{public static void main(String[]l)throws Exception{
    Map<String,Integer>m=new HashMap();String w="";
    int i=0,k=0,j=8,x,y,g=22;
    for(;j>0;){j=System.in.read();if(j>90)j-=32;if(j>64&j<91)w+=(char)j;else{
        if(!w.matches("^(|.|THE|AND|OF|TO|I[TNS]|OR)$"))m.put(w,m.get(w)!=null?m.get(w)+1:1);w="";
    }}
    l=m.keySet().toArray(l);x=l.length;if(x<g)g=x;
    for(;i<g;++i)for(j=i;++j<x;)if(m.get(l[i])<m.get(l[j])){w=l[i];l[i]=l[j];l[j]=w;}
    for(;k<g;k++){x=76-l[k].length();y=m.get(l[k]);if(k<1||y*i>x*j){i=x;j=y;}}
    String s=new String(new char[m.get(l[0])*i/j]).replace('\0','_');
    System.out.println(" "+s);
    for(k=0;k<g;k++){w=l[k];System.out.println("|"+s.substring(0,m.get(w)*i/j)+"| "+w);}}
}

동작 개선이없는 버전은 615 자입니다.

import java.util.*;class F{public static void main(String[]l)throws Exception{Map<String,Integer>m=new HashMap();String w="";int i=0,k=0,j=8,g=22;for(;j>0;){j=System.in.read();if(j>90)j-=32;if(j>64&j<91)w+=(char)j;else{if(!w.matches("^(|.|THE|AND|OF|TO|I[TNS]|OR)$"))m.put(w,m.get(w)!=null?m.get(w)+1:1);w="";}}l=m.keySet().toArray(l);for(;i<g;++i)for(j=i;++j<l.length;)if(m.get(l[i])<m.get(l[j])){w=l[i];l[i]=l[j];l[j]=w;}i=76-l[0].length();String s=new String(new char[i]).replace('\0','_');System.out.println(" "+s);for(k=0;k<g;k++){w=l[k];System.out.println("|"+s.substring(0,m.get(w)*i/m.get(l[0]))+"| "+w);}}}

스칼라 2.8 311 314 320 330 332 336 341 375 문자

긴 단어 조정을 포함합니다. 다른 솔루션에서 빌린 아이디어.

이제 스크립트 ( a.scala)로 :

val t="\\w+\\b(?<!\\bthe|and|of|to|a|i[tns]?|or)".r.findAllIn(io.Source.fromFile(argv(0)).mkString.toLowerCase).toSeq.groupBy(w=>w).mapValues(_.size).toSeq.sortBy(-_._2)take 22
def b(p:Int)="_"*(p*(for((w,c)<-t)yield(76.0-w.size)/c).min).toInt
println(" "+b(t(0)._2))
for(p<-t)printf("|%s| %s \n",b(p._2),p._1)

로 실행

scala -howtorun:script a.scala alice.txt

BTW에서 314 자에서 311 자까지 편집하면 실제로 1 자만 제거됩니다. 누군가 이전에 계산이 잘못되었습니다 (Windows CR?).

Clojure 282 엄격한

(let[[[_ m]:as s](->>(slurp *in*).toLowerCase(re-seq #"\w+\b(?<!\bthe|and|of|to|a|i[tns]?|or)")frequencies(sort-by val >)(take 22))[b](sort(map #(/(- 76(count(key %)))(val %))s))p #(do(print %1)(dotimes[_(* b %2)](print \_))(apply println %&))](p " " m)(doseq[[k v]s](p \| v \| k)))

좀 더 명확하게 :

(let[[[_ m]:as s](->> (slurp *in*)
                   .toLowerCase
                   (re-seq #"\w+\b(?<!\bthe|and|of|to|a|i[tns]?|or)")
                   frequencies
                   (sort-by val >)
                   (take 22))
     [b] (sort (map #(/ (- 76 (count (key %)))(val %)) s))
     p #(do
          (print %1)
          (dotimes[_(* b %2)] (print \_))
          (apply println %&))]
  (p " " m)
  (doseq[[k v] s] (p \| v \| k)))

스칼라, 368 자

먼저, 592 자의 읽을 수있는 버전 :

object Alice {
  def main(args:Array[String]) {
    val s = io.Source.fromFile(args(0))
    val words = s.getLines.flatMap("(?i)\\w+\\b(?<!\\bthe|and|of|to|a|i|it|in|or|is)".r.findAllIn(_)).map(_.toLowerCase)
    val freqs = words.foldLeft(Map[String, Int]())((countmap, word)  => countmap + (word -> (countmap.getOrElse(word, 0)+1)))
    val sortedFreqs = freqs.toList.sort((a, b)  => a._2 > b._2)
    val top22 = sortedFreqs.take(22)
    val highestWord = top22.head._1
    val highestCount = top22.head._2
    val widest = 76 - highestWord.length
    println(" " + "_" * widest)
    top22.foreach(t => {
      val width = Math.round((t._2 * 1.0 / highestCount) * widest).toInt
      println("|" + "_" * width + "| " + t._1)
    })
  }
}

콘솔 출력은 다음과 같습니다.

$ scalac alice.scala 
$ scala Alice aliceinwonderland.txt
 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|_____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|____________________________________________| that
|____________________________________| as
|_________________________________| her
|______________________________| at
|______________________________| with
|_____________________________| s
|_____________________________| t
|___________________________| on
|__________________________| all
|_______________________| had
|_______________________| but
|______________________| be
|______________________| not
|____________________| they
|____________________| so
|___________________| very
|___________________| what

공격적인 축소를 수행하여 415 자로 줄일 수 있습니다.

object A{def main(args:Array[String]){val l=io.Source.fromFile(args(0)).getLines.flatMap("(?i)\\w+\\b(?<!\\bthe|and|of|to|a|i|it|in|or|is)".r.findAllIn(_)).map(_.toLowerCase).foldLeft(Map[String, Int]())((c,w)=>c+(w->(c.getOrElse(w,0)+1))).toList.sort((a,b)=>a._2>b._2).take(22);println(" "+"_"*(76-l.head._1.length));l.foreach(t=>println("|"+"_"*Math.round((t._2*1.0/l.head._2)*(76-l.head._1.length)).toInt+"| "+t._1))}}

콘솔 세션은 다음과 같습니다.

$ scalac a.scala 
$ scala A aliceinwonderland.txt
 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|_____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|____________________________________________| that
|____________________________________| as
|_________________________________| her
|______________________________| at
|______________________________| with
|_____________________________| s
|_____________________________| t
|___________________________| on
|__________________________| all
|_______________________| had
|_______________________| but
|______________________| be
|______________________| not
|____________________| they
|____________________| so
|___________________| very
|___________________| what

스칼라 전문가가 더 잘할 수 있다고 확신합니다.

업데이트 : 의견에서 Thomas는 368 자로 더 짧은 버전을 제공했습니다.

object A{def main(a:Array[String]){val t=(Map[String, Int]()/:(for(x<-io.Source.fromFile(a(0)).getLines;y<-"(?i)\\w+\\b(?<!\\bthe|and|of|to|a|i|it|in|or|is)".r findAllIn x) yield y.toLowerCase).toList)((c,x)=>c+(x->(c.getOrElse(x,0)+1))).toList.sortBy(_._2).reverse.take(22);val w=76-t.head._1.length;print(" "+"_"*w);t map (s=>"\n|"+"_"*(s._2*w/t.head._2)+"| "+s._1) foreach print}}

당연히 375 자 :

object Alice {
  def main(a:Array[String]) {
    val t = (Map[String, Int]() /: (
      for (
        x <- io.Source.fromFile(a(0)).getLines
        y <- "(?i)\\w+\\b(?<!\\bthe|and|of|to|a|i|it|in|or|is)".r.findAllIn(x)
      ) yield y.toLowerCase
    ).toList)((c, x) => c + (x -> (c.getOrElse(x, 0) + 1))).toList.sortBy(_._2).reverse.take(22)
    val w = 76 - t.head._1.length
    print (" "+"_"*w)
    t.map(s => "\n|" + "_" * (s._2 * w / t.head._2) + "| " + s._1).foreach(print)
  }
}

자바-896 자

931 자

읽을 수없는 1233 자

1977 문자 "비 압축"

업데이트 : 적극적으로 문자 수를 줄였습니다. 업데이트 된 스펙 당 단일 문자 단어를 생략합니다.

나는 C #과 LINQ를 너무 부러워합니다.

import java.util.*;import java.io.*;import static java.util.regex.Pattern.*;class g{public static void main(String[] a)throws Exception{PrintStream o=System.out;Map<String,Integer> w=new HashMap();Scanner s=new Scanner(new File(a[0])).useDelimiter(compile("[^a-z]+|\\b(the|and|of|to|.|it|in|or|is)\\b",2));while(s.hasNext()){String z=s.next().trim().toLowerCase();if(z.equals(""))continue;w.put(z,(w.get(z)==null?0:w.get(z))+1);}List<Integer> v=new Vector(w.values());Collections.sort(v);List<String> q=new Vector();int i,m;i=m=v.size()-1;while(q.size()<22){for(String t:w.keySet())if(!q.contains(t)&&w.get(t).equals(v.get(i)))q.add(t);i--;}int r=80-q.get(0).length()-4;String l=String.format("%1$0"+r+"d",0).replace("0","_");o.println(" "+l);o.println("|"+l+"| "+q.get(0)+" ");for(i=m-1;i>m-22;i--){o.println("|"+l.substring(0,(int)Math.round(r*(v.get(i)*1.0)/v.get(m)))+"| "+q.get(m-i)+" ");}}}

"판독 가능":

import java.util.*;
import java.io.*;
import static java.util.regex.Pattern.*;
class g
{
   public static void main(String[] a)throws Exception
      {
      PrintStream o = System.out;
      Map<String,Integer> w = new HashMap();
      Scanner s = new Scanner(new File(a[0]))
         .useDelimiter(compile("[^a-z]+|\\b(the|and|of|to|.|it|in|or|is)\\b",2));
      while(s.hasNext())
      {
         String z = s.next().trim().toLowerCase();
         if(z.equals(""))
            continue;
         w.put(z,(w.get(z) == null?0:w.get(z))+1);
      }
      List<Integer> v = new Vector(w.values());
      Collections.sort(v);
      List<String> q = new Vector();
      int i,m;
      i = m = v.size()-1;
      while(q.size()<22)
      {
         for(String t:w.keySet())
            if(!q.contains(t)&&w.get(t).equals(v.get(i)))
               q.add(t);
         i--;
      }
      int r = 80-q.get(0).length()-4;
      String l = String.format("%1$0"+r+"d",0).replace("0","_");
      o.println(" "+l);
      o.println("|"+l+"| "+q.get(0)+" ");
      for(i = m-1; i > m-22; i--)
      {
         o.println("|"+l.substring(0,(int)Math.round(r*(v.get(i)*1.0)/v.get(m)))+"| "+q.get(m-i)+" ");
      }
   }
}

앨리스의 출력 :

 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|_____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|____________________________________________| that
|____________________________________| as
|_________________________________| her
|______________________________| with
|______________________________| at
|___________________________| on
|__________________________| all
|________________________| this
|________________________| for
|_______________________| had
|_______________________| but
|______________________| be
|______________________| not
|____________________| they
|____________________| so
|___________________| very
|___________________| what

돈키호테의 결과 (구텐베르크 출신) :

 ________________________________________________________________________
|________________________________________________________________________| that
|________________________________________________________| he
|______________________________________________| for
|__________________________________________| his
|________________________________________| as
|__________________________________| with
|_________________________________| not
|_________________________________| was
|________________________________| him
|______________________________| be
|___________________________| don
|_________________________| my
|_________________________| this
|_________________________| all
|_________________________| they
|________________________| said
|_______________________| have
|_______________________| me
|______________________| on
|______________________| so
|_____________________| you
|_____________________| quixote

참고 URL : https://stackoverflow.com/questions/3169051/build-an-ascii-chart-of-the-most-used-words-in-a-given-text

'Programing' 카테고리의 다른 글

프로그래밍 방식으로 EditText에서 포커스를 설정하고 키보드를 표시하는 방법 (0)	2020.06.05
“vcpu reg를 동기화하지 못했습니다”오류는 어떻게 해결합니까? (0)	2020.06.05
Xcode iPhone 시뮬레이터 스케일 및 크기 조정 (0)	2020.06.05
사람들이 왜 여전히 Java에서 기본 유형을 사용합니까? (0)	2020.06.05
긴 ScrollView 레이아웃으로 스크롤하는 방법은 무엇입니까? (0)	2020.06.05

현재글주어진 텍스트에서 가장 일반적으로 사용되는 단어의 ASCII 차트 작성

주어진 텍스트에서 가장 일반적으로 사용되는 단어의 ASCII 차트 작성

주어진 텍스트에서 가장 일반적으로 사용되는 단어의 ASCII 차트 작성

도전 과제 :

LabVIEW 51 노드, 5 구조, 10 다이어그램

루비 1.9, 185 자

GolfScript, 177 175 173 167 164 163 144 131 130 문자

206

쉘, grep, tr, grep, sort, uniq, sort, head, perl

거래 SQL은 (2005 SQL 서버) 솔루션을 기반으로 세트 1,063 892 873 853 827 820 783 683 647 644 630 자

루비 207 213 211 210 207 개 203 201 200 문자

Mathematica ( 297 284 248 244 242 199) 순수 기능

Zipf의 법 테스트

Zipf의 법 테스트

편집 6> (242 문자)

7 → 199 자 편집

C 번호 - 510 451 436 446 434 426 개 422 문자 (축소 된)

펄 237 개 229 209 문자

Windows PowerShell, 199 자

루비, 215, 216 , 218 , 221 , 224 , 236 , 237 자

파이썬 2.x, 위 도적 접근 = 227 183 자

란트

파이썬 2.x에서, 격식을 차리는 방법 = 277 개 243 문자

하스켈 - 366 351 344 337 333 자

자바 스크립트 1.8 (SpiderMonkey)-354

자바 스크립트 (Rhino)- 405 395 387 377 368 343 304 자

PHP CLI 버전 (450 자)

파이썬 3.1 - (245) 229 부르는 것들은

펄, 205 개 191 개 189 문자 / 205 자 (완전히 구현)

펄 : 203 202 201 198 195 208 231분의 203 문자

F #, 452 자

파이썬 2.6, 347 자

* sh (+ 컬), 부분 솔루션

Gawk-336 (원래 507) 자

일반적인 LISP, 670 자

C (828)

펄, 185 자

자바 - 886 865 756 744 742 744 752 742 714 680 문자

스칼라 2.8 311 314 320 330 332 336 341 375 문자

Clojure 282 엄격한

스칼라, 368 자

자바-896 자

931 자

읽을 수없는 1233 자

1977 문자 "비 압축"

'Programing' 카테고리의 다른 글

'Programing'의 다른글

관련글

티스토리툴바