How a browser renders a web page.
How does a browser turn page output in to a web page? In this post I’ll be talking about the process a browser goes through to turn page output in to a web page that you see and interact with. How a browser renders a web page is how it turns the page output in to the page that you see and interact with.
When optimising a WordPress based site to improve performance audits, understanding how a browser renders a web page is key to be able to to improve page output and gain an improved perromance metric.
What is page output?
Page output is the page code that a WordPress (or any website) generates and passes back to the browser when a page is requested.
If you browse to a page, then right click on it, then click on “view page source” you’ll see the page output in it’s raw formate.
To a human, the page output doesn’t make much sense, or provide a friendly web browsing experience, so the browser takes this page output, and turns it in to the page that you see in the browser.
The process of turning page output in to the page that you see is called “rendering”. The browser renders the page output in to the page that it ultimately displays. How does it do that? Here’s how a browser renders a web page.
The initial page request.
This takes place before any rendering occurs.
You enter the address of a website in to the browser’s address bar and press return.
The browser sends the request for the web page to the web server.
The web server receives the request, establishes the document root of the site being requested. The code in the document root of the site then executes (in the context of WordPress, this is PHP being executed), this interacts with the site’s database (in the context of WordPress this is where page content is held), and the two between then generate page output.
The page output is then passed back to the browser, via the web server, and the browser starts to render the page.
How a browser renders a web page (the “in my head” version).
This is the short-short version. The short version (which is a bit longer) is below.
The HTML is used to make a “document object model” (DOM). The DOM looks like a draft of a web page, like it was drawn with a pencil, and doesn’t have any styling.
The CSS is downloaded and computed and used to make the “cascading style sheet object model” (CSSOM). The CSSOM contains styling information. The CSSOM is then applied to the DOM and the page is updated. Now instead of looking like a draft of a web page drawn with a pencil, it has styling information such as colours, fonts and some aspects of layout.
You then get the page in the browser.
That process sounds quite sequential, but it’s actually happening all at once.
The reason it’s probably a good ides to read the other versions below is because it will give you an understanding of how the page rendering process can be hindered. An example of this might be render blocking CSS.
Render blocking CSS is caused by a script being prioritised by being in the <head> of the HTML, but because it’s big, or takes a lot of processing this holds up page rendering because the CSSOM can’t be made until the CSS task has been completed, which in turn delays the CSSOM being applied to the DOM.
There are other issues that poor page output can cause, but I’m not writing this to explain that. What follows is (if you read and understand what’s below) aimed at helping you be able to look at an issue being reported by something like https://pagespeed.web.dev/ and know what’s happening, why, and this will in turn provide you with some direction with regard to how to address a flagged issue.
The Short “How a broswer renders a web page” Explanation.
What follows is a brief version of how a browser renders a page:
- HTML Parsing: The browser begins by parsing (reading and interpreting) the HTML given in the page output.
- DOM (Document Object Model) Tree: The browser creates a DOM tree, which is a structured representation of the HTML. It’s like a family tree, with the HTML elements organised hierarchically. Each element becomes a node in this tree.
- CSS Parsing: While parsing the HTML, the browser also looks for and parses any linked or inline CSS (Cascading Style Sheets) files. CSS is used to define the visual style of the web page.
- CSSOM (CSS Object Model) Tree: Just like the DOM tree, the browser creates a CSSOM tree. This represents the styles applied to each element in the DOM tree.
- Layout: Once the Render Tree is ready, the browser calculates the layout of the page. This involves determining the position and size of each visible element. It takes into account things like the width of the viewport, margins, padding, and more.
- Painting: Finally, the browser paints the pixels on the screen. It uses the layout information to draw each element in the correct position and style. The browser’s rendering engine takes care of this process.
- Rasterisation: The painted elements are converted into bitmaps (images) that can be displayed on your screen. These bitmaps are composed and displayed in layers, allowing for efficient rendering.
- Display: The bitmaps are displayed on your screen, and you see the web page in your browser window.
In summary, the browser parses the HTML and CSS, creates a DOM and CSSOM, combines them to form a Render Tree, calculates the layout, paints the page, and displays it on your screen. That’s how a broswer renders a web page. This entire sequence happens quickly, making it seem like web pages load instantaneously.
Now let’s look at each of the steps above in a bit more depth.
HTML parsing is the first step that a browser takes. It reads and interprets the HTML in the page output.
The browser initially determines the character encoding of the HTML document. This is essential for correctly displaying text. Common character encodings include UTF-8 and ISO-8859-1.
The HTML code is then broken down into smaller units called “tokens.” Tokens represent the fundamental building blocks of HTML, such as tags, attributes, and text content. The browser identifies these tokens as it scans through the HTML code.
After tokenisation, the browser performs lexical analysis to understand the role of each token. It categorizes tokens into specific types, like start tags, end tags, attributes, and text content. This helps the browser understand the structure of the document.
The browser uses a parser to analyse the tokens and create a structured tree-like representation known as the Document Object Model (DOM). The DOM represents the hierarchy and relationships of elements in the HTML document. Each HTML element becomes a node in this tree, with parent-child relationships, just like a family tree.
During the parsing process, the browser also checks for syntax errors in the HTML. If it encounters errors like missing closing tags or improperly nested elements, the browser will try to correct them to the best of its ability, following a set of parsing rules known as the “error recovery” algorithm.
The Document Object Model (DOM).
As the parsing continues, the parser continues to analyse the tokens and create a structured tree-like representation known as the Document Object Model (DOM).
The DOM represents the hierarchy and relationships of elements in the HTML document. Each HTML element becomes a node in this tree, with parent-child relationships, just like a family tree.
DOM tree is gradually built, and it becomes accessible for the browser to manipulate and render.
The DOM tree is structured as a hierarchy of nodes, where each node represents an element, attribute, or piece of text in the HTML document. These nodes are organised in a tree-like structure, with one node called the “root” node, representing the entire document. Below the root node, there are child nodes, which represent the elements contained within the document.
There are different types of nodes in the DOM.
- Element Nodes: These represent HTML elements, like <div>, <p>, or <a>. Element nodes can have child nodes that represent their own content, attributes, and nested elements.
- Text Nodes: These represent the text content within an element. For example, if you have <p>Hello, World!</p>, the text node would contain “Hello, World!”.
- Attribute Nodes: These nodes represent attributes of elements. For example, if you have <img src=”image.jpg”>, there is an attribute node for the src attribute of the <img> element.
- Comment Nodes: These represent HTML comments. For example, <!– This is a comment –> would be represented as a comment node.
Elements in the DOM tree have parent-child relationships. For example, if you have the following HTML code:
The <div> element is the parent of the <p> element.
The <p> element is the child of the <div> element.
Elements at the same level in the hierarchy (i.e., with the same parent) are called siblings. For example, if you have:
The <li> elements are siblings.
Each element node in the DOM tree can have associated attributes. For example, an <a> (anchor) element can have an href attribute that contains the link’s destination.
he DOM tree is also used for event handling. Developers can attach event listeners to specific elements in the tree, which allows them to respond to user interactions (e.g., clicks, mouseovers) on the web page.
In summary, the DOM tree is a structured, hierarchical representation of an HTML or XML document. It allows developers to access, manipulate, and interact with the content and structure of a web page, making it a fundamental concept in web development and dynamic web applications.
In practical terms, a site that consists of ONLY a DOM would look very “plain”, with few colours, fonts in system font format (such as times new roman) and the page layout would be very sequential, like a big list of page elements.
While parsing the HTML, the browser also looks for and parses any linked or inline CSS (Cascading Style Sheets) files. CSS is used to define the visual style of the web page.
CSS parsing is the process by which a web browser reads and interprets Cascading Style Sheets (CSS) associated with an HTML document. CSS defines the visual styles and layout of HTML elements on a webpage. The parsing process involves the following steps:
- Linking External Stylesheets: The browser first looks for any external CSS files linked in the HTML using the <link> element or embedded within the <style> element.
- Parsing CSS Rules: The CSS code is parsed to identify CSS rules. CSS rules consist of selectors and declarations. Selectors determine which HTML elements the rule applies to, and declarations specify the style properties and values for those elements.
- Selecting Elements: The browser processes selectors to identify which elements in the DOM they apply to. This is a critical step, as it determines which elements will receive specific styles.
- Cascading and Specificity: The browser takes into account the cascade and specificity rules to resolve conflicting CSS rules. The “cascading” aspect determines the priority of styles when multiple rules target the same element. “Specificity” is a way to calculate which rule should take precedence if there is a conflict.
- Inheritance: If a property is not explicitly defined for an element, the browser checks if the property is inherited from a parent element. If so, it applies that inherited value.
- Computed Styles: After resolving all the rules, the browser computes the final styles for each element, taking into account the specificities and cascade. These computed styles include values for all the CSS properties that apply to the element.
Just like the DOM tree, the browser creates a CSSOM tree (using linked or inline CSS in the page output). This represents the styles applied to each element in the DOM tree.
CSS Object Model (CSSOM) Tree.
The CSSOM is a structured representation of the CSS styles applied to the DOM tree (CCSOM is applied to DOM). It works in conjunction with the DOM to determine how elements are styled and how they should be laid out on the web page:
- Creation: As the browser parses CSS rules (as above), it constructs the CSSOM tree. The CSSOM tree mirrors the structure of the DOM tree, with nodes representing the elements and their associated styles.
- CSS Rule Representation: Each node in the CSSOM tree corresponds to a CSS rule. These nodes contain information about the selectors, properties, and values associated with that rule.
- Style Computation: The CSSOM tree holds the computed styles for each element based on the resolved CSS rules. It may also store information about pseudo-elements (e.g., ::before, ::after) and pseudo-classes (e.g., :hover, :focus) and their respective styles.
- Recalculation: When styles need to be updated (e.g., due to user interactions or responsive design changes), the browser recalculates the styles in the CSSOM tree and applies them to the DOM, triggering reflows and repaints if necessary.
In summary, CSS parsing involves interpreting CSS rules and applying them to HTML elements, while the CSSOM tree is a structured representation of these styles that works in tandem with the DOM to render web pages.
- Abstract Syntax Tree (AST) Creation: Once the code is successfully parsed, the browser generates an Abstract Syntax Tree (AST). The AST is a hierarchical representation of the code’s structure and semantics. It helps the browser understand the meaning of the code and how it should be executed.
- Synchronous Execution: If the script is synchronous (i.e., not marked with the async or defer attributes), it blocks the parsing and rendering of the HTML document until it’s fully executed. This means the browser will execute the script immediately, which can affect page load times.
- Asynchronous Execution: If the script is marked as async, it will be executed asynchronously while the parsing and rendering of the HTML document continue. This can help improve page load times by not blocking the page rendering.
- Deferred Execution: If the script is marked as defer, it is also executed asynchronously, but it waits until the HTML parsing is complete before running. This ensures that the script doesn’t affect the parsing and rendering of the page but still executes before the DOMContentLoaded event.
The render tree.
The Render Tree is a crucial concept in web page rendering, as it represents the elements and styles that will be displayed on the screen. It’s an intermediary step between the Document Object Model (DOM) and the actual rendering of the page. Here’s a more detailed explanation of the Render Tree:
Creation of the Render Tree:
- Combining DOM and CSSOM: The Render Tree is created by combining information from two sources: the DOM (Document Object Model) tree, which represents the structure and content of the HTML document, and the CSSOM (CSS Object Model) tree, which represents the applied styles and layout information.
- Filtering out Non-visual Elements: The Render Tree includes only the elements that are intended to be displayed on the screen. Non-visual elements (e.g., <head>, <script>, hidden elements with CSS properties like display: none, and those that are outside the viewport) are typically not included.
Characteristics of the Render Tree
- Elements with Computed Styles: Each node in the Render Tree represents an element from the DOM. These nodes are associated with their computed styles. These styles are determined based on the resolved CSS rules, taking into account cascade, specificity, and inheritance.
- Paint Order: The Render Tree defines the order in which elements should be painted on the screen. This order is essential for rendering efficiency and handling layers. Elements are ordered according to their position in the DOM tree, and CSS properties like z-index and stacking context rules affect the paint order.
- Layout Information: The Render Tree includes layout information, such as the dimensions and position of each element. These values are calculated based on the styles applied to the elements. Layout information is crucial for rendering the web page correctly, as it determines where each element should be placed on the screen.
Purpose of the Render Tree
- Painting: The Render Tree serves as the basis for the painting process. When the browser paints the web page on the screen, it uses the information in the Render Tree to draw each element in the correct order and position. This process is known as “painting.”
- Efficiency: The Render Tree’s structure and ordering of elements are designed for efficient rendering. Elements with similar styles are grouped together, and elements that don’t contribute to the visual layout are excluded.
- Optimisations: Browsers often apply optimisations to the Render Tree. For example, they may create layers to improve performance. Layers can be used for hardware acceleration and are especially useful for animations and transitions
Manipulation and Interactivity
- Events can be attached to elements in the Render Tree, making it possible to respond to user interactions like clicks, mouseovers, and keyboard inputs.
In summary, the Render Tree is a representation of the visible and stylized elements of a web page. It is created by combining information from the DOM and CSSOM and is used as the basis for the painting process, optimizing rendering performance and allowing for dynamic interactivity on web pages.
Layout and Reflow.
The “layout” or “reflow” is a crucial step in the process of rendering a web page. It determines how elements are positioned and sized within the viewport, considering factors like content, styles, and screen dimensions. Here’s a more detailed explanation of the layout process in web page rendering:
- Recalculating Styles: Before layout, the browser may need to recalculate styles, especially when there are dynamic changes (e.g., user interactions, media queries). The browser computes the styles of each element in the Render Tree, taking into account resolved CSS rules. This includes properties like width, height, margin, padding, and positioning.
- Computing Dimensions: Layout calculates the dimensions and positions of all elements in the Render Tree. It takes into account the content size, padding, borders, margins, and other style properties. Elements are positioned relative to their containing elements and the viewport, and their final dimensions are determined.
- Box Model: The layout process is based on the box model, which defines an element’s content area, padding, border, and margin. These values are computed and considered during layout. Margins may collapse under certain conditions, and this can affect the positioning of elements.
- Parent-Child Relationships: The layout process ensures that child elements are correctly positioned and sized within their parent elements. This involves computing the space that each child element occupies within its parent container.
- Handling Floats and Clearing: Floating elements (e.g., floating images or text) are taken into account during layout. Elements can float left or right, and non-floating elements may wrap around them. Clearing is used to prevent elements from wrapping around floated elements. Elements with the clear property ensure they don’t overlap floated elements.
- Determining Position and Flow: Layout also establishes the flow and positioning of elements. Elements can be positioned using properties like position: absolute, position: relative, or position: fixed. The display property (e.g., display: block, display: inline) influences how elements are displayed and flow within the layout.
- Handling Overflow: The layout process accounts for elements that overflow their containing elements, and scrollbars are added when necessary to allow users to view the full content of an overflowing element.
- Layout Triggers: Certain actions trigger layout recalculations, including:
- Changes in the DOM structure (e.g., adding or removing elements).
- Changes in styles (e.g., modifying an element’s width or height).
- Changes in viewport size.
- User interactions (e.g., resizing the browser window).
- Performance Considerations: Layout is a computationally intensive process. Browsers aim to optimise layout calculations to improve performance. For example, they may create layers and use hardware acceleration to speed up rendering.
In summary, the layout process is responsible for determining the size, position, and flow of elements on a web page. It takes into account styles, dimensions, and relationships between elements and is crucial for rendering a web page correctly within the viewport. Layout recalculations can be triggered by changes in the DOM, styles, viewport size, and user interactions, and optimising layout performance is a priority for web browsers.
Painting is a critical step in the process of rendering a web page. It involves converting the visual representation of a web page, as defined by the Render Tree and its associated styles, into pixels that can be displayed on your screen. Here’s a detailed explanation of the painting process:
Elements in the Render Tree:
- Painting starts with the Render Tree, which represents the visible and stylised elements of a web page.
- Elements in the Render Tree are ordered based on the CSS stacking context and their position in the DOM. This order is crucial for correctly rendering elements that might overlap.
- To optimize rendering performance, modern browsers often divide the Render Tree into layers. Layers can be thought of as separate planes, each containing a subset of elements. They can be hardware-accelerated, providing smoother animations and transitions.
- Layers are created based on the characteristics of elements. For example, elements with opacity, transforms, or video content may be placed in separate layers.
- Rasterisation is the process of converting elements into bitmaps (images). Each element in the Render Tree, or layer in some cases, is turned into a bitmap.
- Rasterisation takes into account the element’s dimensions, styles, and properties like opacity, background, borders, and text.
- Text elements are rasterised by rendering the actual characters and fonts into bitmaps.
- If layers are used, the bitmaps of these layers are combined in a specific order. This process considers the stacking context defined by CSS, including z-index and the positioning of elements.
- The stacking order ensures that elements appearing higher in the DOM tree or with higher z-index values are drawn on top of elements below them.
Creating a Composite:
- After combining the bitmaps, the browser creates a final composite of the web page’s content. This composite is what you see on your screen.
- The composite is efficiently drawn on the screen, taking into account the viewport and the user’s viewable area.
Repaints and Invalidations:
- Not all rendering operations require repainting the entire web page. Browsers aim to optimize rendering by minimising repaints.
- Repaints may occur when an element changes its styles, content, or position, triggering the need to redraw part of the page.
- Browsers often use techniques like “dirty rectangles” to identify and repaint only the parts of the page that have changed.
- Performance optimisation is a key consideration in painting. Browsers may cache previously painted content, use hardware acceleration, and take other steps to ensure efficient rendering.
In summary, painting is the process of converting the visual representation of a web page into pixel data that can be displayed on the screen. It involves creating layers, rasterizing elements, combining bitmaps, and optimising rendering to provide a smooth and efficient user experience. The careful management of layers, minimizing repaints, and using hardware acceleration are essential techniques for optimising painting in web browsers.
Rasterisation is a critical step in the process of rendering a web page. It involves converting vector-based information, which defines shapes, text, and other visual elements, into a grid of pixels. This process allows these elements to be displayed on a screen or other output device. Here’s a more detailed explanation of rasterisation:
Vector Graphics vs. Raster Graphics
- Vector Graphics: In web design, visual elements like text, shapes, and paths are initially represented in vector format. Vector graphics define these elements using mathematical formulas, making them resolution-independent and infinitely scalable without loss of quality. They are represented as paths, lines, and curves.
- Raster Graphics: Displays, including computer screens and printers, operate in a raster format, which consists of a grid of pixels. Raster graphics are made up of individual pixels and are resolution-dependent. Rasterization is the process of converting vector graphics into these pixel-based images.
- In the rendering process, visual elements on a web page, as defined by the Render Tree and CSS styles, are prepared for rasterisation. This includes elements like text, images, and shapes.
- Text elements involve the most complex rasterisation, as they require rendering of individual characters and fonts.
- Rasterisation often involves anti-aliasing, a technique used to smooth the jagged or pixelated edges of shapes and text. This makes images appear less blocky and more visually pleasing.
- Anti-aliasing achieves this effect by blending the pixel colours at the edges of shapes, creating a smoother transition from the object to its background.
- When rasterizing an image, it’s important to consider sampling, which determines how the vector-based image is converted into pixel data.
- Sampling involves taking multiple samples from the vector graphic and determining the color and intensity of each pixel based on the properties of the samples.
- Elements with transparency, such as partially transparent images or overlaid text with opacity, require special handling during rasterisation.
- The process of calculating the correct colours and blending them to represent transparency can be complex.
- Once rasterisation is complete, the visual elements are represented as a grid of pixels, similar to how digital photographs are made up of tiny squares.
- The resulting bitmap image is then combined with other images, text, and graphical elements to create a composite that represents the web page’s content.
Display and Performance
- The rasterised content is ready for display on the screen. Displaying rasterised images is efficient and fast because screens are inherently pixel-based.
- Performance considerations during rasterisation include optimising the process to ensure smooth and responsive web page rendering.
In summary, rasterisation is the process of converting vector-based visual elements on a web page into a grid of pixels, allowing them to be displayed on screens or other output devices. This process includes rendering text, shapes, and images, applying anti-aliasing for smooth edges, handling transparency, and ensuring efficient performance. Rasterisation is a critical step in the overall rendering process, ensuring that the web page appears as intended to the user.
The “display” phase in web page rendering is the final step in the process of how a browser renders a web page. It involves showing the rendered content, which includes text, images, graphics, and interactive elements, on the user’s screen. Here’s a detailed explanation of the display phase:
- The display phase takes the final composite of the web page, which was created during the painting phase, and prepares it for rendering on the screen.
- This composite is a combination of all the visual elements (text, images, backgrounds, shapes, etc.) on the page, organised and layered according to the z-index, stacking context, and CSS rules.
Rendering Engine Interaction:
- The web browser’s rendering engine interacts with the operating system and hardware to display the content.
- The rendering engine sends the visual data to the graphics subsystem of the operating system, which is responsible for rendering the web page on the screen.
- The rendering engine turns the composite into individual pixels that can be displayed on the screen. Each pixel’s colour and position are determined based on the information provided by the composite.
- The output is displayed on the screen in real-time, providing a visual representation of the web page.
- For each frame, the graphics subsystem continuously refreshes the display, ensuring that any animations or changes are reflected as quickly as possible.
Handling User Interactions:
- During the display phase, the browser remains responsive to user interactions, such as clicks, mouse movements, and keyboard inputs. The browser processes these inputs and updates the display accordingly.
Responsiveness and Smoothness:
- A well-optimised display phase ensures a smooth and responsive user experience. This is especially important for web applications, games, and multimedia content that rely on fast and accurate display updates.
- Many modern browsers use hardware acceleration to improve display performance. This involves leveraging the capabilities of the user’s graphics hardware (e.g., GPU) to offload some of the rendering work and ensure smoother and more efficient display.
- The frame rate is a measure of how many times per second the display is refreshed. A common target for smooth web page rendering is 60 frames per second (FPS), but this can vary depending on the device’s capabilities and the complexity of the web page.
- The user can interact with the displayed content, such as clicking links, submitting forms, or navigating through the site. User interactions trigger changes in the DOM and may initiate new rendering cycles.
In summary, the display phase is the final step in how a browser renders a web page, where the rendered content is shown on the user’s screen. This process includes converting the visual composite into pixels, continuous updates for responsiveness and smoothness, and leveraging hardware acceleration for efficient rendering. The display phase ensures that the web page is presented to the user as intended, allowing for interaction and providing a seamless user experience.
There’s quite a lot involved in how a browser renders a web page. Although this all happens quite instantaneously from a human perspective, it’s a good idea to gain an understanding of how a browser renders a web page, as this will help you to be able to optimise your page output for imrpoved performance.
- HTML is used to create the Document Object Model (DOM)
- CSS is used to create the Cascading Style Sheet Object Model (CSSOM)
- The CSSOM is applied to the DOM to make the render tree
- The AST manipulates the DOM and the render tree
- The browser calculates the page layout
- The broswer then paints the page
- The page becomes interactive
- The page updates accoridng to event listeners and user interaction