Programing

GAE에서 완벽하게 유효한 XML을 구문 분석 할 때 "내용이 프롤로그에 허용되지 않습니다"

crosscheck 2020. 9. 4. 06:53
반응형

GAE에서 완벽하게 유효한 XML을 구문 분석 할 때 "내용이 프롤로그에 허용되지 않습니다"


나는 지난 48 시간 동안이 절대적으로 분노한 벌레에 대해 머리를 치고 있었기 때문에 마침내 타월을 던져 여기에서 물어보고 내 노트북을 창문 밖으로 던지기 전에 물어 보겠다고 생각했습니다.

AWS SimpleDB에 대한 호출에서 응답 XML을 구문 분석하려고합니다. 응답은 정상적으로 돌아오고 있습니다. 예를 들어 다음과 같을 수 있습니다.

<?xml version="1.0" encoding="utf-8"?> 
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/">
    <ListDomainsResult>
        <DomainName>Audio</DomainName>
        <DomainName>Course</DomainName>
        <DomainName>DocumentContents</DomainName>
        <DomainName>LectureSet</DomainName>
        <DomainName>MetaData</DomainName>
        <DomainName>Professors</DomainName>
        <DomainName>Tag</DomainName>
    </ListDomainsResult>
    <ResponseMetadata>
        <RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId>
        <BoxUsage>0.0000071759</BoxUsage>
    </ResponseMetadata>
</ListDomainsResponse>

이 XML을 파서에 전달합니다.

XMLEventReader eventReader = xmlInputFactory.createXMLEventReader(response.getContent());

eventReader.nextEvent();원하는 데이터를 얻기 위해 여러 번 전화를 겁니다 .

여기에 기괴한 부분이 있습니다. 로컬 서버에서 잘 작동합니다. 응답이 들어오고 파싱하면 모두가 행복합니다. 문제는 Google App Engine에 코드를 배포 할 때 나가는 요청이 계속 작동하고 응답 XML이 100 % 동일하고 정확 해 보이지만 응답이 다음 예외와 함께 구문 분석에 실패한다는 것입니다.

com.amazonaws.http.HttpClient handleResponse: Unable to unmarshall response (ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.): <?xml version="1.0" encoding="utf-8"?> 
<ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><ListDomainsResult><DomainName>Audio</DomainName><DomainName>Course</DomainName><DomainName>DocumentContents</DomainName><DomainName>LectureSet</DomainName><DomainName>MetaData</DomainName><DomainName>Professors</DomainName><DomainName>Tag</DomainName></ListDomainsResult><ResponseMetadata><RequestId>42330b4a-e134-6aec-e62a-5869ac2b4575</RequestId><BoxUsage>0.0000071759</BoxUsage></ResponseMetadata></ListDomainsResponse>
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source)
    at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Source)
    at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:153)
    ... (rest of lines omitted)

이 XML을 '보이지 않는 문자'또는 UTF8이 아닌 인코딩 된 문자 등에 대해 이중, 삼중, 사중으로 확인했습니다. 바이트 순서 표시 또는 그 성격의 배열에서 바이트 단위로 살펴 보았습니다. 아무것도; 내가 던질 수있는 모든 검증 테스트를 통과합니다. 더 이상하게도 Saxon 기반 파서를 사용하면 발생하지만 GAE에서만 항상 로컬 환경에서 잘 작동합니다.

완벽하게 작동하는 환경에서만 디버거를 실행할 수있을 때 문제에 대한 코드를 추적하기가 매우 어렵습니다 (GAE에서 원격으로 디버깅하는 좋은 방법을 찾지 못했습니다). 그럼에도 불구하고 기본 수단을 사용하여 다음을 포함하여 백만 가지 접근 방식을 시도했습니다.

  • 프롤로그가 있거나없는 XML
  • 줄 바꿈 포함 및 제외
  • 프롤로그에 "encoding ="속성이 있거나없는 경우
  • 두 개행 스타일
  • HTTP 스트림에 청킹 정보가있는 경우와없는 경우

그리고 나는 그것들이 상호 작용하는 것이 합리적 인 여러 조합으로 이들 대부분을 시도했습니다. 나는 내 지혜의 끝에있다. 누구든지 이전에 이와 같은 문제를 본 적이 있습니까?

감사!


XML과 XSD (또는 DTD)의 인코딩이 다릅니다.
XML 파일 헤더 : <?xml version='1.0' encoding='utf-8'?>
XSD 파일 헤더 :<?xml version='1.0' encoding='utf-16'?>

Another possible scenario that causes this is when anything comes before the XML document type declaration. i.e you might have something like this in the buffer:

helloworld<?xml version="1.0" encoding="utf-8"?>  

or even a space or special character.

There are some special characters called byte order markers that could be in the buffer. Before passing the buffer to the Parser do this...

String xml = "<?xml ...";
xml = xml.trim().replaceFirst("^([\\W]+)<","<");

This error message is always caused by the invalid XML content in the beginning element. For example, extra small dot “.” in the beginning of XML element.

Any characters before the “<?xml….” will cause above “org.xml.sax.SAXParseException: Content is not allowed in prolog” error message.

A small dot “.” before the “<?xml….

To fix it, just delete all those weird characters before the “<?xml“.

Ref: http://www.mkyong.com/java/sax-error-content-is-not-allowed-in-prolog/


I was facing the same issue. In my case XML files were generated from c# program and feeded into AS400 for further processing. After some analysis identified that I was using UTF8 encoding while generating XML files whereas javac(in AS400) uses "UTF8 without BOM". So, had to write extra code similar to mentioned below:

//create encoding with no BOM
Encoding outputEnc = new UTF8Encoding(false); 
//open file with encoding
TextWriter file = new StreamWriter(filePath, false, outputEnc);           

file.Write(doc.InnerXml);
file.Flush();
file.Close(); // save and close it

Removing the xml declaration solved it

<?xml version='1.0' encoding='utf-8'?>

I had issue while inspecting the xml file in notepad++ and saving the file, though I had the top utf-8 xml tag as <?xml version="1.0" encoding="utf-8"?>

Got fixed by saving the file in notpad++ with Encoding(Tab) > Encode in UTF-8:selected (was Encode in UTF-8-BOM)


In my xml file, the header looked like this:

<?xml version="1.0" encoding="utf-16"? />

In a test file, I was reading the file bytes and decoding the data as UTF-8 (not realizing the header in this file was utf-16) to create a string.

byte[] data = Files.readAllBytes(Paths.get(path));
String dataString = new String(data, "UTF-8");

When I tried to deserialize this string into an object, I was seeing the same error:

javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Content is not allowed in prolog.

When I updated the second line to

String dataString = new String(data, "UTF-16");

I was able to deserialize the object just fine. So as Romain had noted above, the encodings need to match.


I was facing the same problem called "Content is not allowed in prolog" in my xml file.

Solution

Initially my root folder was '#Filename'.

When i removed the first character '#' ,the error got resolved.

No need of removing the #filename... Try in this way..

Instead of passing a File or URL object to the unmarshaller method, use a FileInputStream.

File myFile = new File("........");
Object obj = unmarshaller.unmarshal(new FileInputStream(myFile));

I had a tab character instead of spaces. Replacing the tab '\t' fixed the problem.

Cut and paste the whole doc into an editor like Notepad++ and display all characters.


In my instance of the problem, the solution was to replace german umlauts (äöü) with their HTML-equivalents...


bellow are cause above “org.xml.sax.SAXParseException: Content is not allowed in prolog” exception.

  1. First check the file path of schema.xsd and file.xml.
  2. The encoding in your XML and XSD (or DTD) should be same.
    XML file header: <?xml version='1.0' encoding='utf-8'?>
    XSD file header: <?xml version='1.0' encoding='utf-8'?>
  3. if anything comes before the XML document type declaration.i.e: hello<?xml version='1.0' encoding='utf-16'?>

In my case, I had the problem with a build.xml file. This was solved with just going to Build > Clean Project.


In the spirit of "just delete all those weird characters before the <?xml", here's my Java code, which works well with input via a BufferedReader:

    BufferedReader test = new BufferedReader(new InputStreamReader(fisTest));
    test.mark(4);
    while (true) {
        int earlyChar = test.read();
        System.out.println(earlyChar);
        if (earlyChar == 60) {
            test.reset();
            break;
        } else {
            test.mark(4);
        }
    }

FWIW, the bytes I was seeing are (in decimal): 239, 187, 191.


Unexpected reason: # character in file path

Due to some internal bug, the error Content is not allowed in prolog also appears if the file content itself is 100% correct but you are supplying the file name like C:\Data\#22\file.xml.

This may possibly apply to other special characters, too.

How to check: If you move your file into a path without special characters and the error disappears, then it was this issue.

참고URL : https://stackoverflow.com/questions/3030903/content-is-not-allowed-in-prolog-when-parsing-perfectly-valid-xml-on-gae

반응형