JS를 사용하여 HTML 문자열 구문 분석

itgroup 2022. 10. 29. 14:17

JS를 사용하여 HTML 문자열 구문 분석

HTML 텍스트가 포함된 문자열을 해석하고 싶습니다.자바스크립트로 하고 싶어요.

Pure JavaScript HTML 파서 라이브러리를 사용해 봤는데 문자열이 아닌 현재 페이지의 HTML을 해석하는 것 같습니다.아래 코드를 시도하면 페이지 제목이 바뀌기 때문입니다.

var parser = new HTMLtoDOM("<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>", document);

HTML 외부 페이지에서 스트링처럼 링크를 추출하는 것이 목표입니다.

이를 위한 API를 알고 계십니까?

더미 DOM 요소를 만들고 문자열을 추가합니다.그런 다음 다른 DOM 요소와 마찬가지로 조작할 수 있습니다.

var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";

el.getElementsByTagName( 'a' ); // Live NodeList of your anchor elements

편집: jQuery 답변을 추가하여 팬들을 기쁘게 합니다!

var el = $( '<div></div>' );
el.html("<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>");

$('a', el) // All the anchor elements

매우 간단합니다.

var parser = new DOMParser();
var htmlDoc = parser.parseFromString(txt, 'text/html');
// do whatever you want with htmlDoc.getElementsByTagName('a');

MDN에 따르면 크롬에서 이를 수행하려면 다음과 같이 XML로 해석해야 합니다.

var parser = new DOMParser();
var htmlDoc = parser.parseFromString(txt, 'text/xml');
// do whatever you want with htmlDoc.getElementsByTagName('a');

~~현재 웹킷에서는 지원되지 않으며 Florian의 답변을 따라야 하며 대부분의 경우 모바일 브라우저에서 작동하는 것은 알려져 있지 않습니다.~~

편집: 폭넓게 지원

편집: 다음 솔루션은 html, 헤드 및 본문이 삭제되었기 때문에 HTML "fragments" 전용입니다.이 질문에 대한 해결책은 DOMParser의 parseFromString() 메서드일 것입니다.

const parser = new DOMParser();
const document = parser.parseFromString(html, "text/html");

HTML fragment의 경우, 여기에 기재되어 있는 솔루션은 대부분의 HTML에서 동작합니다만, 경우에 따라서는 동작하지 않는 경우도 있습니다.

예를 들어, 구문 분석을 시도합니다.<td>Test</td>이건 div.inner에 효과가 없어HTML 솔루션, DOMParser.protype.parseFromString 솔루션, range.createContextualFragment 솔루션.td 태그가 없어지고 텍스트만 남습니다.

이 케이스는 jQuery만이 잘 처리합니다.

따라서 향후 솔루션(MS Edge 13+)에서는 템플릿 태그를 사용합니다.

function parseHTML(html) {
    var t = document.createElement('template');
    t.innerHTML = html;
    return t.content;
}

var documentFragment = parseHTML('<td>Test</td>');

오래된 브라우저의 경우 jQuery의 해석을 추출했습니다.HTML() 메서드를 독립된 GIST로 변환 - https://gist.github.com/Munawwar/6e6362dbdf77c7865a99

var doc = new DOMParser().parseFromString(html, "text/html");
var links = doc.querySelectorAll("a");

Chrome 및 Firefox에서 HTML을 해석하는 가장 빠른 방법은 Range #createContextualFragment입니다.

var range = document.createRange();
range.selectNode(document.body); // required in Safari
var fragment = range.createContextualFragment('<h1>html...</h1>');
var firstNode = fragment.firstChild;

가능한 경우 createContextualFragment를 사용하여 내부로 폴백하는 도우미 기능을 만들 것을 권장합니다.HTML 이외의 경우

벤치마크: http://jsperf.com/domparser-vs-createelement-innerhtml/3

다음 함수parseHTML는 다음 중 하나를 반환합니다.

a 파일이 doctype으로 시작하는 경우.
a 파일이 doctype으로 시작되지 않는 경우.

코드:

function parseHTML(markup) {
    if (markup.toLowerCase().trim().indexOf('<!doctype') === 0) {
        var doc = document.implementation.createHTMLDocument("");
        doc.documentElement.innerHTML = markup;
        return doc;
    } else if ('content' in document.createElement('template')) {
       // Template tag exists!
       var el = document.createElement('template');
       el.innerHTML = markup;
       return el.content;
    } else {
       // Template tag doesn't exist!
       var docfrag = document.createDocumentFragment();
       var el = document.createElement('body');
       el.innerHTML = markup;
       for (i = 0; 0 < el.childNodes.length;) {
           docfrag.appendChild(el.childNodes[i]);
       }
       return docfrag;
    }
}

사용방법:

var links = parseHTML('<!doctype html><html><head></head><body><a>Link 1</a><a>Link 2</a></body></html>').getElementsByTagName('a');

const parse = Range.prototype.createContextualFragment.bind(document.createRange());

document.body.appendChild( parse('<p><strong>Today is:</strong></p>') ),
document.body.appendChild( parse(`<p style="background: #eee">${new Date()}</p>`) );

Only valid child Nodes within the parent Node (start of the Range) will be parsed. Otherwise, unexpected results may occur:

// <body> is "parent" Node, start of Range
const parseRange = document.createRange();
const parse = Range.prototype.createContextualFragment.bind(parseRange);

// Returns Text "1 2" because td, tr, tbody are not valid children of <body>
parse('<td>1</td> <td>2</td>');
parse('<tr><td>1</td> <td>2</td></tr>');
parse('<tbody><tr><td>1</td> <td>2</td></tr></tbody>');

// Returns <table>, which is a valid child of <body>
parse('<table> <td>1</td> <td>2</td> </table>');
parse('<table> <tr> <td>1</td> <td>2</td> </tr> </table>');
parse('<table> <tbody> <td>1</td> <td>2</td> </tbody> </table>');

// <tr> is parent Node, start of Range
parseRange.setStart(document.createElement('tr'), 0);

// Returns [<td>, <td>] element array
parse('<td>1</td> <td>2</td>');
parse('<tr> <td>1</td> <td>2</td> </tr>');
parse('<tbody> <td>1</td> <td>2</td> </tbody>');
parse('<table> <td>1</td> <td>2</td> </table>');

원웨이

document.cloneNode()

퍼포먼스:

<고객명>님에게 document.cloneNode()~0.2249977299012 입니다.

어쩌면 더 많을지도 몰라

var t0, t1, html;

t0 = performance.now();
   html = document.cloneNode(true);
t1 = performance.now();

console.log("Call to doSomething took " + (t1 - t0) + " milliseconds.")

html.documentElement.innerHTML = '<!DOCTYPE html><html><head><title>Test</title></head><body><div id="test1">test1</div></body></html>';

console.log(html.getElementById("test1"));

쌍방향

document.implementation.createHTMLDocument()

퍼포먼스:

<고객명>님에게 document.implementation.createHTMLDocument()~0.1000010128133입니다.

var t0, t1, html;

t0 = performance.now();
html = document.implementation.createHTMLDocument("test");
t1 = performance.now();

console.log("Call to doSomething took " + (t1 - t0) + " milliseconds.")

html.documentElement.innerHTML = '<!DOCTYPE html><html><head><title>Test</title></head><body><div id="test1">test1</div></body></html>';

console.log(html.getElementById("test1"));

스리웨이

document.implementation.createDocument()

퍼포먼스:

<고객명>님에게 document.implementation.createHTMLDocument()~0.1000010128133입니다.

var t0 = performance.now();
  html = document.implementation.createDocument('', 'html', 
             document.implementation.createDocumentType('html', '', '')
         );
var t1 = performance.now();

console.log("Call to doSomething took " + (t1 - t0) + " milliseconds.")

html.documentElement.innerHTML = '<html><head><title>Test</title></head><body><div id="test1">test</div></body></html>';

console.log(html.getElementById("test1"));

포웨이

new Document()

퍼포먼스:

<고객명>님에게 document.implementation.createHTMLDocument()소요되었습니다.0.13499999840860255입니다.

메모

ParentNode.append2020년

var t0, t1, html;

t0 = performance.now();
//---------------
html = new Document();

html.append(
  html.implementation.createDocumentType('html', '', '')
);
    
html.append(
  html.createElement('html')
);
//---------------
t1 = performance.now();

console.log("Call to doSomething took " + (t1 - t0) + " milliseconds.")

html.documentElement.innerHTML = '<html><head><title>Test</title></head><body><div id="test1">test1</div></body></html>';

console.log(html.getElementById("test1"));

node.js에서 이를 수행하려면 node-html-parser와 같은 HTML 파서를 사용합니다.구문은 다음과 같습니다.

import { parse } from 'node-html-parser';

const root = parse('<ul id="list"><li>Hello World</li></ul>');

console.log(root.firstChild.structure);
// ul#list
//   li
//     #text

console.log(root.querySelector('#list'));
// { tagName: 'ul',
//   rawAttrs: 'id="list"',
//   childNodes:
//    [ { tagName: 'li',
//        rawAttrs: '',
//        childNodes: [Object],
//        classNames: [] } ],
//   id: 'list',
//   classNames: [] }
console.log(root.toString());
// <ul id="list"><li>Hello World</li></ul>
root.set_content('<li>Hello World</li>');
root.toString();    // <li>Hello World</li>

jQuery를 사용할 수 있다면 HTML 문자열에서 분리된 DOM 요소를 만들 수 있는 기능이 있습니다. 그런 다음 일반적인 방법으로 쿼리할 수 있습니다. 예:

var html = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";
var anchors = $('<div/>').append(html).find('a').get();

편집 - 방금 @Florian의 정답을 보았습니다.이것은 기본적으로 그가 말한 것과 동일하지만, jQuery와 관련이 있습니다.

가장 좋은 방법은 이 API를 다음과 같이 사용하는 것이라고 생각합니다.

//Table string in HTML format
const htmlString = '<table><tbody><tr><td>Cell 1</td><td>Cell 2</td></tr></tbody></table>';

//Parse using DOMParser native way
const parser = new DOMParser();
const $newTable = parser.parseFromString(htmlString, 'text/html');

//Here you can select parts of your parsed html and work with it
const $row = $newTable.querySelector('table > tbody > tr');

//Here i'm printing the number of columns (2)
const $containerHtml = document.getElementById('containerHtml');
$containerHtml.innerHTML = ['Your parsed table have ', $row.cells.length, 'columns.'].join(' ');

<div id="containerHtml"></div>

나는 이너를 사용해야만 했다.Angular NGX Bootstrap 팝오버에서 구문 분석된 요소의 HTML.이것이 나에게 효과가 있었던 해결책이다.

public htmlContainer = document.createElement( 'html' );

건설업자의

this.htmlContainer.innerHTML = ''; setTimeout(() => { this.convertToArray(); });

 convertToArray() {
    const shapesHC = document.getElementsByClassName('weekPopUpDummy');
    const shapesArrHCSpread = [...(shapesHC as any)];
    this.htmlContainer = shapesArrHCSpread[0];
    this.htmlContainer.innerHTML = shapesArrHCSpread[0].textContent;
  }

html로

<div class="weekPopUpDummy" [popover]="htmlContainer.innerHTML" [adaptivePosition]="false" placement="top" [outsideClick]="true" #popOverHide="bs-popover" [delay]="150" (onHidden)="onHidden(weekEvent)" (onShown)="onShown()">

function parseElement(raw){
    let el = document.createElement('div');
    el.innerHTML = raw;
    let res = el.querySelector('*');
    res.remove();
    return res;
}

주의: raw 문자열은 1개 이상의 요소를 사용할 수 없습니다.

let content = "<center><h1>404 Not Found</h1></center>"
let result = $("<div/>").html(content).text()

★★★★★<center><h1>404 Not Found</h1></center> ,
★★★★★"404 Not Found"

언급URL : https://stackoverflow.com/questions/10585029/parse-an-html-string-with-js

'IT' 카테고리의 다른 글

HashSet 정렬 방법 (0)	2022.10.29
Intelij 코드 포맷, 새 줄의 Java 주석 (0)	2022.10.29
게터와 세터? (0)	2022.10.29
MySQL의 utf8mb4와 utf8 문자 집합의 차이점은 무엇입니까? (0)	2022.10.29
Java의 마커 인터페이스? (0)	2022.10.29

현재글JS를 사용하여 HTML 문자열 구문 분석

각종 프로그래밍 정보를 다루는 블로그입니다.

AngularJS, Python, Excel, C, git, reactjs, WordPress, powershell, JavaScript, spring-boot, sql-server, Ajax, php, jQuery, MongoDB, Java, oracle, MySQL, MariaDB, JSON,

Today :
Yesterday :

itgroup