Programing

자바 코드 몇 줄로 문자열을 URL로 읽기

crosscheck 2020. 6. 19. 18:59
반응형

자바 코드 몇 줄로 문자열을 URL로 읽기


Groovy와 동등한 Java를 찾으려고합니다.

String content = "http://www.google.com".toURL().getText();

URL에서 문자열로 내용을 읽고 싶습니다. 그런 간단한 작업을 위해 버퍼 스트림과 루프로 코드를 오염시키고 싶지 않습니다. 아파치의 HttpClient를 살펴 보았지만 한두 줄 구현도 보지 못했습니다.


원래의 대답이 받아 들여진 후 어느 정도 시간이 지났으므로 더 나은 접근 방법이 있습니다.

String out = new Scanner(new URL("http://www.google.com").openStream(), "UTF-8").useDelimiter("\\A").next();

한 줄이 아닌 약간 더 완전한 구현을 원한다면 다음을 수행하십시오.

public static String readStringFromURL(String requestURL) throws IOException
{
    try (Scanner scanner = new Scanner(new URL(requestURL).openStream(),
            StandardCharsets.UTF_8.toString()))
    {
        scanner.useDelimiter("\\A");
        return scanner.hasNext() ? scanner.next() : "";
    }
}

이 답변은 이전 버전의 Java를 나타냅니다. ccleve의 답변을보고 싶을 수도 있습니다.


이를 수행하는 전통적인 방법은 다음과 같습니다.

import java.net.*;
import java.io.*;

public class URLConnectionReader {
    public static String getText(String url) throws Exception {
        URL website = new URL(url);
        URLConnection connection = website.openConnection();
        BufferedReader in = new BufferedReader(
                                new InputStreamReader(
                                    connection.getInputStream()));

        StringBuilder response = new StringBuilder();
        String inputLine;

        while ((inputLine = in.readLine()) != null) 
            response.append(inputLine);

        in.close();

        return response.toString();
    }

    public static void main(String[] args) throws Exception {
        String content = URLConnectionReader.getText(args[0]);
        System.out.println(content);
    }
}

@extraneon이 제안 했듯이 ioutils를 사용하면 여전히 Java 정신에있는 매우 웅변적인 방법 으로이 작업을 수행 할 수 있습니다.

 InputStream in = new URL( "http://jakarta.apache.org" ).openStream();

 try {
   System.out.println( IOUtils.toString( in ) );
 } finally {
   IOUtils.closeQuietly(in);
 }

Or just use Apache Commons IOUtils.toString(URL url), or the variant that also accepts an encoding parameter.


Now that more time has passed, here's a way to do it in Java 8:

URLConnection conn = url.openConnection();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8))) {
    pageText = reader.lines().collect(Collectors.joining("\n"));
}

There's an even better way as of Java 9:

URL u = new URL("http://www.example.com/");
try (InputStream in = u.openStream()) {
    return new String(in.readAllBytes(), StandardCharsets.UTF_8);
}

Like the original groovy example, this assumes that the content is UTF-8 encoded. (If you need something more clever than that, you need to create a URLConnection and use it to figure out the encoding.)


Additional example using Guava:

URL xmlData = ...
String data = Resources.toString(xmlData, Charsets.UTF_8);

If you have the input stream (see Joe's answer) also consider ioutils.toString( inputstream ).

http://commons.apache.org/io/api-1.4/org/apache/commons/io/IOUtils.html#toString(java.io.InputStream)


The following works with Java 7/8, secure urls, and shows how to add a cookie to your request as well. Note this is mostly a direct copy of this other great answer on this page, but added the cookie example, and clarification in that it works with secure urls as well ;-)

If you need to connect to a server with an invalid certificate or self signed certificate, this will throw security errors unless you import the certificate. If you need this functionality, you could consider the approach detailed in this answer to this related question on StackOverflow.

Example

String result = getUrlAsString("https://www.google.com");
System.out.println(result);

outputs

<!doctype html><html itemscope="" .... etc

Code

import java.net.URL;
import java.net.URLConnection;
import java.io.BufferedReader;
import java.io.InputStreamReader;

public static String getUrlAsString(String url)
{
    try
    {
        URL urlObj = new URL(url);
        URLConnection con = urlObj.openConnection();

        con.setDoOutput(true); // we want the response 
        con.setRequestProperty("Cookie", "myCookie=test123");
        con.connect();

        BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));

        StringBuilder response = new StringBuilder();
        String inputLine;

        String newLine = System.getProperty("line.separator");
        while ((inputLine = in.readLine()) != null)
        {
            response.append(inputLine + newLine);
        }

        in.close();

        return response.toString();
    }
    catch (Exception e)
    {
        throw new RuntimeException(e);
    }
}

Here's Jeanne's lovely answer, but wrapped in a tidy function for muppets like me:

private static String getUrl(String aUrl) throws MalformedURLException, IOException
{
    String urlData = "";
    URL urlObj = new URL(aUrl);
    URLConnection conn = urlObj.openConnection();
    try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8))) 
    {
        urlData = reader.lines().collect(Collectors.joining("\n"));
    }
    return urlData;
}

URL to String in pure Java

Example call

 String str = getStringFromUrl("YourUrl");

Implementation

You can use the method described in this answer, on How to read URL to an InputStream and combine it with this answer on How to read InputStream to String.

The outcome will be something like

public String getStringFromUrl(URL url) throws IOException {
        return inputStreamToString(urlToInputStream(url,null));
}

public String inputStreamToString(InputStream inputStream) throws IOException {
    try(ByteArrayOutputStream result = new ByteArrayOutputStream()) {
        byte[] buffer = new byte[1024];
        int length;
        while ((length = inputStream.read(buffer)) != -1) {
            result.write(buffer, 0, length);
        }

        return result.toString(UTF_8);
    }
}

private InputStream urlToInputStream(URL url, Map<String, String> args) {
    HttpURLConnection con = null;
    InputStream inputStream = null;
    try {
        con = (HttpURLConnection) url.openConnection();
        con.setConnectTimeout(15000);
        con.setReadTimeout(15000);
        if (args != null) {
            for (Entry<String, String> e : args.entrySet()) {
                con.setRequestProperty(e.getKey(), e.getValue());
            }
        }
        con.connect();
        int responseCode = con.getResponseCode();
        /* By default the connection will follow redirects. The following
         * block is only entered if the implementation of HttpURLConnection
         * does not perform the redirect. The exact behavior depends to 
         * the actual implementation (e.g. sun.net).
         * !!! Attention: This block allows the connection to 
         * switch protocols (e.g. HTTP to HTTPS), which is <b>not</b> 
         * default behavior. See: https://stackoverflow.com/questions/1884230 
         * for more info!!!
         */
        if (responseCode < 400 && responseCode > 299) {
            String redirectUrl = con.getHeaderField("Location");
            try {
                URL newUrl = new URL(redirectUrl);
                return urlToInputStream(newUrl, args);
            } catch (MalformedURLException e) {
                URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
                return urlToInputStream(newUrl, args);
            }
        }
        /*!!!!!*/

        inputStream = con.getInputStream();
        return inputStream;
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

Pros

  • It is pure java

  • It can be easily enhanced by adding different headers (instead of passing a null object, like the example above does), authentication, etc.

  • Handling of protocol switches is supported

참고URL : https://stackoverflow.com/questions/4328711/read-url-to-string-in-few-lines-of-java-code

반응형