In this paper, a new approach called HDNA (HTML DNA) is introduced for analyzing and comparing Document Object Model (DOM) trees in order to detect differences in HTML pages. This method assigns an identifier to each HTML page based on its structure, which proves to be particularly useful for detecting variations caused by server-side updates, user interactions or potential security risks. The process involves preprocessing the HTML content generating a DOM tree and calculating the disparities between two or more trees. By assigning weights to the nodes valuable insights about their hierarchical importance are obtained. The effectiveness of the HDNA approach has been demonstrated in identifying changes in DOM trees even when dynamically generated content is involved. Not does this method benefit web developers, testers, and security analysts by offering a deeper understanding of how web pages evolve. It also helps ensure the functionality and performance of web applications. Additionally, it enables detection and response to vulnerabilities that may arise from modifications in DOM structures. As the web ecosystem continues to evolve HDNA proves to be a tool, for individuals engaged in web development, testing, or security analysis.
翻译:暂无翻译