Main Content

mlreportgen.dom.HTMLFile class

Package: mlreportgen.dom

Convert an HTML file to a DOM document

Description

Converts the contents of an HTML file to an mlreportgen.dom.HTMLFile object containing DOM objects having the same content and format. You can append the HTMLFile object to a DOM document of any type, including Word and PDF documents.

Construction

htmlFileObj = HTMLFile(htmlFile) converts the HTML file to an HTMLFile object containing DOM objects having the same content and format.

An HTMLFile object supports these HTML elements and attributes. In addition, HTMLFile objects accept HTML that contains custom CSS properties, which begin with a hyphen. Custom CSS properties are supported in HTML, Microsoft® Word, and PDF output.

HTML ElementAttributes
aclass, style, href, name
bclass, style
bodyclass, style
brn/a
codeclass, style
delclass, style
divclass, style
fontclass, style, color, face, size
h1, h2, h3, h4, h5, h6class, style, align
hrclass, style, align
iclass, style
insclass, style
imgclass, style, src, height, width
liclass, style
olclass, style
pclass, style, align
preclass, style
sclass, style
spanclass, style
strikeclass, style
subclass, style
supclass, style
tableclass, style, align, bgcolor, border, cellspacing, cellpadding, frame, rules, width
tbodyclass, style, align, valign
tfootclass, style, align, valign
theadclass, style, align, valign
tdclass, style, bgcolor, height, width, colspan, rowspan, valign, nowrap
trclass, style, bgcolor, valign
ttclass, style
uclass, style
ulclass, style

For information about these elements, see https://developer.mozilla.org/en-US/docs/Web/HTML/Element.

These CSS formats are supported:

  • background-color

  • border

  • border-bottom

  • border-bottom-color

  • border-bottom-style

  • boder-bottom-width

  • border-color

  • border-left

  • border-left-color

  • border-left-style

  • boder-left-width

  • border-right

  • border-right-color

  • border-rigtht-style

  • border-right-width

  • border-style

  • border-top

  • border-top-color

  • border-top-style

  • border-top-width

  • border-width

  • color

  • counter-increment

  • counter-reset

  • display

  • font-family

  • font-size

  • font-style

  • font-weight

  • height

  • line-height

  • list-style-type

  • margin

  • margin-bottom

  • margin-left

  • margin-right

  • margin-top

  • padding

  • padding-bottom

  • padding-left

  • padding-right

  • padding-top

  • text-align

  • text-decoration

  • text-indent

  • vertical-align

  • white-space

  • width

For information about these formats, https://developer.mozilla.org/en-US/docs/Web/CSS/Reference.

Input Arguments

expand all

HTML file path, specified as a character vector.

Properties

expand all

Note

For HTML markup to display correctly in your report, you must include end tags for empty elements and enclose attribute values in quotation marks. If you want to show a reserved XML markup character as text, you must use its equivalent named or numeric XML character.

Reserved CharacterDescriptionEquivalent Character
>Greater than>
<Less than&lt;
&Ampersand&amp;
"Double quotation mark&quot;
'Single quotation mark&apos;
%Percent&#37;

A session-unique ID is generated as part of HTMLFile object creation. You can specify an ID to replace the generated ID.

Tag name of HTML container element, specified as a character vector, such as 'div', 'section', or 'article' corresponding to this HTMLFile object. This property applies only to HTML output.

This read-only property lists child elements that the HTMLFile object contains.

This read-only property lists the parent of this HTMLFile object.

Formatting to apply to the HTMLFile object, specified as a cell array of DOM format objects. The children of this HTMLFile object inherit any of these formats that they do not override.

Style name of this HTMLFile object, specified as a character vector. Use a name of a style specified in the style sheet of the document to which this HTMLFile object is appended. The specified style defines the appearance of the HTMLFile object in the output document where not overridden by the formats specified by this StyleName property of the HTMLFile object.

Tag for HTMLFile object, specified as a character vector.

A session-unique ID is generated as part of HTMLFile object creation. The generated tag has the form CLASS:ID, where CLASS is the class of the element and ID is the value of the Id property of the object. You can specify a tag to replace the generated tag.

Specify your own tag value, for example, to make it easier to identify where an issue occurred during document generation.

Note

HTMLFile ignores the KeepInterElementWhiteSpace property. If you want to preserve white space, use fileread to read your HTML file as text and then follow the procedure described for the mlreportgen.dom.HTMLKeepInterElementWhiteSpace property.

Methods

appendAppend HTML to HTMLFile object

Examples

collapse all

Create a text file named myHTML.html and save it in the current folder. Add this text into the file:

<html>
<head>
<style>p {font-size:14pt;}</style>
</head>
<body>
<p style='white-space:pre'><b>Hello</b><i style='color:green'> World</i></p>
<p>This is <u>me</u> speaking</p>
</body>
</html>

To convert the myHTML.html file to a Word report, run these commands:

import mlreportgen.dom.*; 
rpt = Document('MyReport','docx'); 
htmlFile = HTMLFile('myHTML.html'); 
append(rpt,htmlFile); 
close(rpt); 
rptview(rpt.OutputPath);

The resulting Word report contains the text that you specified in the HTML file.

Tips

  • MATLAB® Report Generator™ mlreportgen.dom.HTML or mlreportgen.dom.HTMLFile objects typically cannot accept the raw HTML output of third-party applications, such as Microsoft Word, that export native documents as HTML markup. In these cases, your Report API report generation program can use the mlreportgen.utils.html2dom.prepHTMLString and mlreportgen.utils.html2dom.prepHTMLFile functions to prepare the raw HTML for use with the mlreportgen.dom.HTML or mlreportgen.dom.HTMLFile objects. Typically, your program will have to further process the prepared HTML to remove valid but undesirable objects, such as line feeds that were in the raw content.

  • Word and PDF documents require inline elements, such as text and links, to be contained in a paragraph. To meet this requirement, the HTML parser creates wrapper paragraphs to contain inline elements that are not already in a paragraph. If you create an mlreportgen.dom.HTML or mlreportgen.dom.HTMLFile object from HTML that contains inline elements that are not in paragraphs and add the object to an HTML document, the generated HTML can differ from the input HTML. To generate the inline elements without the added wrapper paragraphs, insert the HTML markup into an HTML document by using an mlreportgen.dom.RawText object.

  • By default, the DOM API uses a base font size of 12 points to convert em units to actual font sizes. For example, a font size specified as 2em converts to 24 points. To specify a different base font size, add your content to a report by using an mlreportgen.dom.HTML object. Set the EMBaseFontSize property of the object to the base font size. For example, if you set the EMBaseFontSize property to 14, a font size of 2em converts to 28 points.

Introduced in R2015a