Telephone: 01932 355 222

Mobile: 07918 952 874

PWS Ltd web editorial services

PDF/UA and skipped headings

4 December 2012

Summary

In July 2012 the International Organization for Standardization (ISO) published its first ever PDF accessibility standard, ISO 14289-1, otherwise known as PDF/UA. The publication of PDF/UA is in many respects a welcome step forward. However, this article will look at its prescriptions for headings which, it will be argued, are problematical.

The problem

Specifically, PDF/UA, paragraph 7.4.2, forbids the practice of skipping heading levels in PDFs. It states:

"If document semantics require a descending sequence of headers, such a sequence shall proceed in strict numerical order and shall not skip an intervening heading level. H1 H2 H3 is permissible, while H1 H3 is not." (emphasis added)

For want of a better term, this article will refer to the first type of heading structure described above (H1 H2 H3…) as a "strictly sequential pattern of headings". The use of this type of structure is, of course, in no way controversial.

The objection

However, the central contention here is that a plausible case for a complete ban on skipped heading levels in PDFs (such as from H1 to H3) has not been made, and that it probably can't be.

Why it matters

It will be argued below that, in some circumstances, skipping a heading level can actually lead to greater clarity and intelligibility, and hence better accessibility. It will also be argued that indiscriminately following the no-skipped-headings rule can create problems that would not otherwise exist.

The need for a reliable benchmark

Currently, many organisations rely heavily (if not completely) on automated PDF accessibility checkers for their benchmark. This is a problem for a number of reasons, including that automated checkers are often highly unreliable in measuring a document's actual accessibility (for more detail on this point see Evaluating the Acrobat PDF accessibility checker). Furthermore, some techniques that would enhance the accessibility of a PDF (fixing footnotes comes most readily to mind) can also make the document fail an automated check, whereas leaving the problem unfixed would not register as a problem at all. In short, over-reliance on automated accessibility checkers can, and does, lead to serious problems.

Standards such as PDF/UA and WCAG 2.0 offer a much better solution. However, it will be argued that, because of PDF/UA's inclusion of a no-skipped-headings rule, document authors may now be faced with a straight choice between "make it accessible" or "make it PDF/UA-compliant".

Article structure

This article is divided into two sections. Section 1 looks at the arguments previously put forward in defence of an outright ban on skipped headings in PDFs, and explains in detail the problems with each argument.

Section 2 provides examples of when following a no-skipped-headings rule without sufficient regard for the actual structure of the document may result in some content being less accessible than it would otherwise be. These use-cases, all of which are based on real-world documents, cover a range of situations, but by no means exhaust the list of possible examples.

What is not in dispute

Before looking in detail at the arguments for an outright ban, the first point that needs to be stressed is that no one would dispute that, most of the time, it is best practice for accessibility purposes to follow a no-skipped-headings rule.

However, "most of the time" is (obviously) not the same thing as "all of the time". In order to justify an outright ban you absolutely must show that breaking the no-skipped-headings rule will always create an accessibility problem, with no exceptions.

Top

Section 1: the case for banning skipped headings

Argument 1: "you're in the minority"

Surprisingly perhaps, one of the most common arguments for banning skipped heading levels is that those in favour of such a ban constitute a significant majority of those in the trade, as it were.

What's wrong with this argument?

There are, of course, more serious objections than this, but because this argument is so commonly employed it does need to be addressed briefly. Its refutation is simple: the numbers of people believing a proposition to be true tell you absolutely nothing at all about whether that proposition is actually true or not (think Flat Earth or Creationism).

I will resist the temptation to expand on this theme here, and just leave the last (but one) word on the subject to Peter Atkins (from On Being, Oxford, 2011, page xi):

"…reliable knowledge is not secured by majority vote."

Quite.

Top

Argument 2: navigation or decoration?

Between April and July 2012 Duff Johnson wrote a series of blog posts setting out his views on why you should never skip headings in PDFs. I will examine each of these arguments in turn, starting with the theme of the first of them: Heading Levels: Navigation or Decoration? (5 April 2012).

In this article he states:

"In a world where heading levels are used for styling, users who depend upon structure for navigation simply cannot trust the heading levels they encounter. Longer documents are, consequently, far harder to navigate."

He makes a similar point in a post to a WebAIM forum thread (6 July 2012), entitled Left Column and Heading Level Order, where he argues that:

[skipped heading levels give readers] "…less reason to trust that the heading levels they encounter will usefully represent logical subsections of content. Instead, they conclude that the page's author must be of the sort who thinks that ‘Our H3 style is the perfect font, size and color to be used for the company's name in an address block in the footer …"

What's wrong with this argument?

These statements raise a number of questions about the expectations of AT (assistive technology) users, the significance of document length, and even the meaning of the word "logical" in this context. All of these issues and more will be addressed in due course, but first let's deal with the specific question of "navigation or decoration?".

The question itself is a problem because it sets up a false dichotomy, namely that there are only two ways to address the issue:

  • Strategy 1: determine heading levels according to visual styling considerations.
  • Strategy 2: use a strictly sequential pattern of headings.

However, strategy 1 can be dealt with easily: nobody is arguing anything different. Not for one moment is anybody suggesting that heading levels should be determined according to visual presentation requirements. This is just a straw man. The real question is: is strategy 2 above sufficient to ensure optimal accessibility in all cases, or is strategy 3 below perhaps a better bet?

  • Strategy 3: recognise that some documents are quite legitimately structured in such a way that automatically and invariably following a strictly sequential pattern of headings will make for a poorer user experience than would otherwise be the case.

We will look at this question in detail throughout the rest of this article.

Top

Argument 3: skipped headings = random headings

Briefly, a similar argument to the above holds that an enforced no-skipped-headings rule is the only way to prevent the practice of heading levels being chosen without due regard to appropriate document semantics (irrespective of visual styling considerations).

What's wrong with this argument?

Once again, nobody is arguing any different. No one is agitating for loose or sloppy document structure, so this is just another straw man.

Top

Argument 4: "PDF is different"

A fourth argument, and one that warrants more detailed examination, is that PDF is somehow different to HTML (in ways that are relevant to the present context, of course).

In Heading Levels: Navigation or Decoration? Duff Johnson compares the different ways in which the issue is addressed in HTML 4, ISO 32000-1 (the PDF standard), WCAG 2.0, and PDF/UA. As he points out, HTML 4 permits skipped headings, but, importantly, both ISO 32000-1 and WCAG 2.0 are ambiguous on this point.

WCAG 2.0's ambiguity on skipped headings

The fact that WCAG 2.0 is ambiguous with respect to skipped headings may well come as a surprise to many. However, in a LinkedIn discussion on this subject early in 2012, Bruce Bailey, a long serving Invited Expert on the W3C WAI GLworking group, explained that:

"requests to add skipped levels as a documented ‘Common Failure’ under WCAG 2.0 had come up a few times…"

but that

"…it does not seem entirely compelling that skipping levels always creates problems and confusion" (emphasis in original).

He added:

"We don't think there should be a blanket prohibition, but we try not to write anything which encourages their use."

WCAG 2.0's ambiguity on this issue is also reflected in its provision of links from each of its H69 and H42 techniques pages to two articles, one arguing for and the other arguing against permitting skipped headings.

Pick a heading

The article arguing the case for permitting skipped headings is made by long time standards advocate Eric Meyer, in a blog post entitled Pick a Heading. Here he sets out two examples of when skipping a heading level will, in his view, serve the needs of the content better than would adhering to a strictly sequential pattern. He sums up his position thus:

"I approach headings as merely indicating a level of importance, and don't bind myself to a decreasing numeric order. That's my take on it; others feel differently."

HTML 4-specific?

Duff Johnson dismisses these examples as HTML 4-specific and hence not relevant to PDF. However, there is absolutely nothing in Eric Meyer's post to suggest that it applies only to HTML 4. It is certainly true that he talks in terms of heading levels denoting "importance" (using the same terminology as does the HTML 4 specification), but that doesn't mean for one moment that the examples he gives would not also apply in other technologies. On the contrary, he is simply making an argument about how content in general might be marked up in a way that best makes sense to the end user. And, as will be seen, far from being HTML 4-specific, his examples very closely match those given in two of the use-cases below.

The need to establish PDF's distinctiveness

Because of WCAG 2.0's ambiguity on skipped headings, in order to justify PDF/UA's absolute ban it obviously becomes necessary to establish that PDF is somehow different to HTML in some relevant way.

A general problem

However, the weakness of any general argument that PDF is materially different to HTML will become apparent immediately if you consider placing exactly the same piece of content (containing one or more skipped heading levels) into both an HTML page and a PDF simultaneously. You will then be obliged to argue that one can be accessible, but that the other cannot. This clearly is an untenable position, and hence a very serious problem for the "PDF is different" position. But it's not the only one.

So what, specifically, is different about PDF?

In WebAIM's Survey: Headings Matter to Users of Assistive Technologies, Duff Johnson attempts to build the case for PDF's distinctiveness thus:

"…headings are critical for navigation via assistive technology almost 100% of the time in PDF files. Why? PDFs almost never include internal navigation links or detailed bookmarks. Users who must use assistive technology in order to read generally depend completely on headings for navigation in PDF files."

What's wrong with this argument?

While it is certainly true that headings are important, there are in fact many ways to navigate a PDF. These include, as Duff points out, bookmarks and tables of contents and, not mentioned, page numbering. And although it is also true that many PDFs fail to get bookmarks, tables of contents and page numbering right (including the PDF/UA document itself, as of 23 October 2012), this is not an argument for never skipping a heading level: it's just an argument for ensuring that these vital navigation aids are provided correctly.

Duff continues:

"The [WebAIM] survey's result is precisely in line with my April, 2012 article about the use of heading levels by users of assistive technology."

He adds:

"There are three reasons for [PDF's distinctiveness]:

  • In PDF, headings are the only means of content-based navigation available to people who use AT.
  • PDF files do not typically include links for intra-document navigational purposes.
  • Usage[s] of PDF commonly include long documents (notably, far longer than web pages)."

And finally he states:

"The latest WebAIM survey … clearly validates the requirements regarding heading levels in PDF/UA."

What's wrong with these arguments?

The first problem here is that Duff's April 2012 article very strongly advocates the no-skipped-headings rule. However, the WebAIM survey simply states:

"These responses show a very strong usefulness of appropriately structured heading levels."

The survey does not state anywhere that "appropriately structured heading levels" necessarily equates to "a strictly sequential pattern of headings", hence it does not validate the PDF/UA position. Furthermore, the three premises that Duff sets out for PDF's distinctiveness are problematical.

In shorthand these are:

  • "headings are the only effective means of navigation" (as we have seen, they're not)
  • "authors don't include other navigational aids" (they should)
  • "PDFs are long" (some are, some aren't)

"PDF is distinctive (2)"

In Heading Levels: Navigation or Decoration? Duff takes a similar approach to trying to establish the distinctiveness argument with the following three points:

"…the vast majority of HTML pages contain only a few headings. Ignore the levels and AT users may still readily ‘scan’ the whole page for headings and navigate accordingly. Consequently, for most web content, headings themselves may matter, heading levels matter a lot less."

"…For effective navigation with AT on [larger] documents, heading levels become increasingly important as a function of both the volume of headings and the number of heading levels in use."

And (crucially):

"…The use cases that prompted the PDF/UA Committee to develop normative language for heading levels are those longer and more structurally complex documents." (emphasis added)

In short, all of the above "PDF is distinctive" arguments boil down to the following:

  • Premise 1: PDFs tend to be longer and structurally more complex than HTML-based documents.
  • Premise 2: headings and heading levels become increasingly important for effective navigation in longer and more complex documents.
  • Conclusion: it is necessary to ban skipped headings in PDFs.

What's wrong with this argument?

It is explicitly stated that the PDF/UA position is predicated on long and structurally complex PDFs. The problem with this, of course, is that it speaks only to longer PDFs. It makes no claims at all about shorter PDFs, and therefore fails completely to substantiate the claim that skipped headings must be banned for all PDFs.

In addition, the problem with premise 2 is that it just assumes that the fact that heading levels are important for AT users equates to the claim that they must be numbered strictly sequentially. As will be discussed in more detail below, nowhere is this point actually established—it is only ever assumed and asserted to be the case.

In short, the conclusion just does not follow from the given premises.

Top

Argument 5: the effect on AT users

The following argument was made in Heading Levels: Navigation or Decoration?

"Discussions … with real-world AT users have made it clear that failing to insist on correct document structure drains heading enumeration of navigational value. AT users are condemned to blundering from heading to heading instead of relying on heading levels to locate content." (emphasis added)

What's wrong with this argument?

Let's first look at the claim that: "correct document structure drains heading enumeration of navigational value" (emphasis added). There are three points to make here.

Firstly, if you look closely, you will see that there is a danger of circular reasoning creeping in here, as follows. The central point that Duff is trying to prove is "this is the correct way to create headings". But this conclusion is based on the premise "this is the correct way to create headings" (as in "correct document structure"). This is, of course, a completely circular argument, and hence no argument at all.

Secondly, given that we know that a strictly sequential pattern of headings is good for accessibility most of the time, it really wouldn't be at all surprising if AT users came to the above conclusions, most of the time. The challenge, of course, would be to come up with this finding for all PDFs, as you must do in order to justify a blanket ban. But Duff actually makes no such claim, and the prospects of being able to do so seem slim.

Which brings me to my third point, which is that there is ample anecdotal (but nevertheless verifiable) counter-evidence. My recent discussions with real-world AT users with respect to the use-cases below, are at odds with the claims that: "AT users are condemned to blundering from heading to heading instead of relying on heading levels to locate content." On the contrary, one "real-world AT user", when presented with use-case 1 below, described the reason for the existence of the document's one skipped heading as "sound thinking".

The problems of terminology revisited

Returning briefly to the topic of terminology, we previously looked at the problems of assuming that "correct document structure" and a strictly sequential pattern of headings were synonymous. Many similar (potentially circular) examples can be found in the previously cited articles including "effective navigation", "navigationally reliable", "valid navigable structure" and "proper implementation of heading levels". But more importantly perhaps, Duff regularly uses the word "logical" (as in "logical structure", "logical heading structures", "logical heading levels" etc) in much the same way, such as in the claim that:

"PDF/UA is distinctive in that it requires that heading levels follow a logical sequence."

In fact, Section 7.4 of PDF/UA doesn't actually mention the word "logical" at all. It is just assumed here that "logical sequence" and a strictly sequential pattern of headings are the same thing. But is this assumption justified?

(1, 2, 3…) or (H1 H2 H3…) certainly are mathematically logical sequences, but so too are (1, 4, 9, 16…)(squares of integers), and (2, 3, 5, 7, 11…)(prime numbers). The point is, you don't have to increment by one (and only one) from one member of a set to the next for a sequence to be logically valid.

Nor does it seem that the use of "logical" in this context is justified on a more everyday understanding of its meaning, such as: "principles governing correct or reliable inferences" or "reason or sound judgment". As will be seen, section 2 contains several examples in which indiscriminately following the no-skipped-headings rule would lead to obviously unreliable inferences or unsound judgments being made about content and document structure.

"PDF is different" revisited

Lastly with respect to terminology, Duff devotes a whole blog post, Defining "Heading" in HTML and PDF, to why he believes that the meaning of "heading" in a PDF "differs critically" from its meaning in HTML. He concludes that PDF is different because:

"According to ISO 32000-1:2008, PDF headings ONLY convey hierarchy: that is their function" (emphasis in original)

 

…with the "key takeaway" being:

"‘Headings’ in PDF are what ISO 32000 says they are, period."

The problem with this argument is that it is, of course, possible to have a hierarchical relationship between two members of a set (such as H1 and H4) without the need for the other members of the set (such as H2 and H3) to be present. (By way of illustration, in the Army, if a General, a Major, a Sergeant and a Private are in a room together and the Major and the Sergeant leave the room, does the hierarchical relationship between the General and the Private leave with them? No, of course it doesn't).

In addition, it is just not in dispute that ISO 32000-1 is ambiguous on the subject of skipped headings. Therefore, the statement that under ISO 320000-1 "PDF headings ONLY convey hierarchy" does not translate into "heading levels cannot be skipped".

Top

Conclusion of section 1

We have seen that WCAG 2.0 is ambiguous on skipped headings. Therefore PDF/UA's position depends on establishing that PDF is different to HTML in some relevant way. All attempts to do so have failed.

First, the simple exercise of placing the same piece of content in an HTML document and a PDF simultaneously makes the "PDF is different" claim untenable—you will have no choice but to argue that one can be accessible while the other can't, even though the reader will have the same reading experience.

Second, the argument that PDF is distinctive because PDFs tend to be longer than HTML pages is, of course, fatally undermined by the existence of shorter PDFs.

Third, attempting to establish the concept of a heading as different in PDF relative to HTML rests on the ISO 32000-1 definition of the role of a heading—to convey hierarchy. There is no disagreement that ISO 32000-1 is ambiguous on skipped headings, so this argument also fails.

Fourth, my feedback from real-world AT users with respect to the use-cases set out in section 2 below, directly contradicts the above cited AT-user testimony.

Clearly, the case for an outright ban on skipped headings in PDFs has not been made.

But there's more

Yet, this is still not the whole story. It is, of course, in no way incumbent on those who doubt the wisdom of an outright ban to prove its inadequacy or disutility—the burden of proof obviously must lie with those who would seek to justify such a ban. Nevertheless, as stated previously, section 2 below provides 5 use-cases that further undermine the PDF/UA position.

Top

Section 2: use-cases requiring skipped headings

Example 1: front matter

The structure of Example 1: front matter is as follows. The main title is tagged with the document's only H1 (see Figure 1 below).

PDF report front cover
Figure 1: Front cover with the main title tagged as an H1

Next, the inside cover page contains a section of front matter that is not part of the main content. There are three sub-headings in this section, namely Author, Organisation and Contact. Below each of these are between two and five lines of text (see Figure 2 below).

Inside cover page with some peripheral content and sub-headings
Figure 2: Inside cover page with Author, Organisation and Contact sections

The next page sees the start of the document's main content, starting with Chapter 1, which is headed by an H2 (see Figure 3 below).

PDF report, main content starting on page 3
Figure 3: Start of the main content, headed by an H2

In shorthand, the heading structure of this document is as follows:

  • H1: Main document title
  • H?: Author, H?: Organisation, H?: Contact
  • H2: Chapter 1
  • H2: Chapter 2
  • H2: Chapter 3
  • H2: Chapter 4

So, how should Author, Organisation and Contact be tagged?

If you were to follow the no-skipped-headings rule you would probably have to tag each of Author, Organisation and Contact as H2s (although other suggested approaches are discussed in the "Anticipating objections" section below). But in doing so you would be sending out the message that they are structurally the equivalent of each chapter, which they clearly are not. (Each chapter in this particular example is three or four pages long, but could just as well be 20 pages or more.)

In fact, most people when presented with this use-case, suggest that Author, Organisation and Contact should be tagged as H3s, and that tagging them as H2s would be misleading.

Comparison with meyerweb.com

This example closely resembles the structure of the home page of Eric Meyer's site meyerweb.com (accessed 29 October 2012), and as rationalised in the previously mentioned Pick a Heading blog post. In meyerweb.com the H4 that labels the search field is doing an almost identical job to Author, Organisation and Contact in Example 1: front matter. Both appear immediately after the H1, but before the main content.

In shorthand, the heading structure is as follows:

  • H1: Main document title
  • H4: Peripheral content (search box)
  • [H2 omitted for editorial reasons—see below]
  • H3: Main content heading

Anticipating objections

Author, Organisation and Contact as P tags

It has been suggested that in a PDF such as Example 1: front matter, Author, Organisation and Contact don't need to be headings at all, and that they could just be marked up as normal text (with P tags). However, "Author" is unambiguously a label for the section of content that contains "Mary Jones, John Joseph and Lena Marsden". As such it should be a heading. The same goes for Organisation and Contact.

But, more importantly, from an end user point of view, if Author, Organisation and Contact were to be marked up with P tags you would lose the ability to locate this section of content by jumping from one heading to another. Such an outcome would be the exact opposite of what the no-skipped-headings rule claims to seek to achieve.

Add an extra H2

It could also be (and has been) argued that an H2 should be added above the Author, Organisation and Contact sections. This assumes, of course, that you have access to the source file and also that you have editorial control over the content. But even if you have both, it's hard to see what the wording you might choose for such a heading might be that wouldn't render it editorially superfluous.

Of course, adding editorially superfluous content violates one of the most important principles of writing for the web: no unnecessary words. In the justly famous words of Strunk and White (The Elements of Style, Allyn and Bacon, 1979):

"Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts. This requires not that the writer make all his sentences short, or that he avoid all detail and treat his subjects only in outline, but that every word tell." (emphasis added)

Eric Meyer, in Pick a Heading, explains his decision to skip the H2 on the home page of meyerweb.com thus:

"I don't have an H2 on the home page because it's reserved for page titles, such as the ‘Eric A. Meyer’ at the top of my personal page. The page title of the home page would be ‘Home Page’ or ‘Main Page’ or something equally silly, so I left it off."

In both Example 1: front matter and the meyerweb.com home page, adding unnecessary content only in order to follow the no-skipped-headings rule would be, as Eric Meyer puts it, "silly". As we will see, the same will also apply, but in a different context in the use-case Example 4: a brief history of Formula 1 below.

The "meaning" of skipping

Another objection to the type of structure set out in Example 1: front matter is made by Duff Johnson as follows:

"Some make the case that it can be ‘editorially correct’ to go from H2 to H4, that heading levels should describe the structure of the content, not determine it. To them I say: how does skipping from H2 to H4 (or whatever) ‘describe’ anything? What does it mean to skip more than one level? Whatever you think a given skip means, how is that knowledge communicated to the reader who is depending on heading levels?"

I most definitely am suggesting that heading levels should describe the structure of content and not determine it (in fact, these are my own words being quoted back to me), to which I would also add that to do otherwise would simply be the tail wagging the dog, or colour by numbers.

But to address the specific question of "how does skipping from H2 to H4 … ‘describe’ anything?", let's turn this around and ask "how does skipping from H4 to H2 describe anything?"

In Example 1: front matter, going from an H4 to an H2 tells you, of course, that a new chapter is starting. And it works just the same the other way round. For example, in Example 1: front matter, skipping from the H1 to an H3 conveys the entirely accurate picture that the "Author" heading is not the start of a main section, it is peripheral content—not part of the primary narrative.

Top

Example 2: secondary content in sidebars

Example 2: secondary content in sidebars, like the previous example, also starts with the document title marked up as an H1. Each main section is headed by an H2 and has sub-sections headed by H3s, which in turn have sub-sections headed by H4s.

Lastly, each section also has a sidebar containing information on environmental issues that are related to, but distinct from, the primary narrative. It would be in no way controversial to mark up the headings for each of the environmental issues sections as H5s. The first of these main section pages is shown in Figure 4 below.

PDF page with H2, H3, H4 and (in the sidebar) H5 heading tags
Figure 4: "Environmental issues" heading in sidebar is an H5

However, the final page of this document contains just an H2 ("Conclusions"), a couple of concluding paragraphs, and, as with the previous pages, at the foot of the sidebar, another heading and short paragraph about environmental issues. This is shown in Figure 5 below.

Conclusions page with H2 and (in the sidebar) H5 headings only
Figure 5: The Conclusions page also has an "Environmental issues" heading in the sidebar

The Conclusions page is a main section in its own right, and so is also headed by an H2 (it needs to be treated in the document's bookmarks and table of contents in the same way as all the other main sections).

So, in shorthand, the heading structure of this document is as follows:

  • H1: Main document title
  • H2: Main section
  • H3: Sub-section
  • H4: Sub-sub-section
  • H5: Environmental issues
  • H2: Main section
  • H3: Sub-section
  • H4: Sub-sub-section
  • H5: Environmental issues
  • H2: Conclusions
  • H?: Environmental issues

In order to be consistent you will have to mark up the "Environmental issues" heading on the Conclusions page in exactly the same way as all the "Environmental issues" headings on the previous pages, that is, as an H5. However, if you are going to follow the no-skipped-headings rule you will almost certainly have to mark it up as an H3.

Anticipating objections to the H5 option

"Environmental issues" as an H3?

With respect to tagging "Environmental issues" as an H3 on the last page (and only the last) page, consider the following from Techniques for WCAG 2.0, PDF9:

"Because headings indicate the start of important sections of content, it is possible for assistive technology users to access the list of headings and to jump directly to the appropriate heading and begin reading the content. This ability to ‘skim’ the content through the headings and go directly to content of interest significantly speeds interaction for users who would otherwise access the content slowly."

Figure 6 below shows what such a list of headings looks like in JAWS for Example 2: secondary content in sidebars.

JAWS headings list
Figure 6: Headings list generated in JAWS

Each "Environmental issues" heading is listed here as an H5, except the last one (on the Conclusions page) which, following the no-skipped-headings rule, has been marked up as an H3.

What message does this give regarding the document's structure?

The obvious interpretation of this is that the environmental issues section on the Conclusions page is somehow structurally different to all of the other environmental issues sections in the document. It isn't, so marking it up as such is misleading.

What impact would it have on the navigability of the document?

If the user was just interested in navigating to each of the "Environmental issues" sections in turn, this can be done by choosing to display only H5s in this list (by selecting "Level 5 Headings" in the Display option list). Alternatively, the reader might choose to repeatedly press "5" to jump from one H5 to the next. Either way, the last environmental issues section would be missed.

Therefore, it really isn't clear how tagging this heading as an H3 would be consistent with the concepts of "effective navigation", "navigationally reliable", "correct document structure", "logical structure" or the rest. To avoid such problems, most definitely the heading mark up needs to describe the structure of the content and not determine it. It would be illogical to do otherwise.

"Environmental issues" as a P tag?

It has actually been suggested to me that in this very example it would be "acceptable" to mark up "Environmental issues" in this context as a paragraph (rather than a heading) in order to avoid tagging it as an H5. However, marking it up as a paragraph would simply cause the environmental issues content to follow on and become indistinguishable from the main "Conclusions" text. And, of course, it would also make it impossible to find via any of the techniques for skimming through headings outlined above. Once again there seems to be only one sensible option here. Tagging all the "Environmental issues" headings as H5s would be logical, consistent and an accurate representation of the document's actual structure.

Top

Example 3: case studies

Example 3: case studies has a similar heading structure to the previous two examples, but is different in that it includes a number of case studies.

Each main section, comprising three or four pages, is headed by an H2, while each main sub-section, comprising one to two pages, is headed by an H3. Lastly, each case study comprises just a single heading and two paragraphs of text.

As shown in Figure 6 below, the heading immediately prior to the first case study is an H4.

PDF page with a case study section
Figure 7: Heading level preceding the case study is an H4

However, as shown in Figure 7 below, the heading preceding the next case study is an H2.

PDF page with a second case study
Figure 8: Heading level preceding the case study is an H2

This pattern repeats itself throughout the rest of the document.

In shorthand, the heading structure for the first part of this document is as follows:

  • H1: Main document title
  • H2: Main section
  • H3: Sub-section
  • H4: Sub-sub-section
  • H?: Case study
  • [More content …]
  • H2: Main section
  • H? Case study
  • H3: Sub-section
  • H4: Sub-sub-section
  • [More content …]

Case study headings as H3s?

It would make no sense at all to mark up the case study headings as H3s as this would imply they were structurally equivalent to the other H3-headed sections, all of which are one to two pages long (again, these could easily have been made much longer to emphasise the point yet further). It would, of course, make even less sense to tag them as H2s.

However, in order to comply with the no-skipped-headings rule you would be obliged to tag as H3s at least those that were immediately preceded by an H2. The others could still be tagged as H4s, but either way would undoubtedly cause confusion. On the other hand, if they were all H4s, no such confusion would exist.

Anticipating objections

It really is not easy to imagine any objection here. The actual document structure, as well as the need for consistency, strongly suggest that each of these case study sections should have the same level of heading as all the others, and that can only be an H4 (or lower).

Top

Example 4: a brief history of Formula 1

As with the previous examples, the cover page title is this document's one and only H1. Each chapter is a brief history of a particular Formula 1 team, and is headed by an H2.

As can be seen in Figure 9 below, the first chapter is divided into sections, by decade, each of which (with one exception that I will come to shortly) is headed by an H3. Finally, each decade has one sub-section, headed by an H4, about the drivers employed by the team in that particular decade.

PDF content page. History of McLaren Formula 1 team
Figure 9: Heading structure jumps from chapter heading H2 to an H4.

The issue

As can be seen, this document's heading structure jumps from the H2 chapter heading to the first H4 ("Drivers in the 1960s").

…and the objections

Add the extra H3 ("The 1960s")

In attempting to defend the no-skipped-headings position it has been suggested to me that in this exact use-case it would be "acceptable" to add an extra H3 ("The 1960s") after the H2. However, as can plainly be seen from the first three words of the text ("Founded in 1963…"), the story starts in the 1960s, so the addition of such a heading would be needless duplication and editorially entirely superfluous.

Tag "Drivers in the 1960s" as an H3

It has also been suggested that it would be "acceptable" to tag the first drivers' section heading as an H3, even though all the other drivers' sections are headed by H4s. However, as we have seen with Example 2: secondary content in sidebars and Example 3: case studies, doing so will only create inconsistency and confusion, and will negatively impact the navigability of the document.

The document is "badly structured"

Finally, the third objection to this use-case was that the document is just "badly structured". However, it is only badly structured if you define "well structured" as requiring a strictly sequential pattern of headings. Apart from the obvious circularity of this argument, can it really be contended that a document that contains an unnecessary heading is "well structured" only because it contains that unnecessary heading (and that it becomes badly structured if you remove the unnecessary content)? Any rule that brings about such a situation is surely setting itself up as a target for ridicule.

Top

Example 5: annual report

Example 5: annual report comprises two main sections; the first section is six pages long while the second consists of just one page. The document title is once again its one and only H1 and each main section is headed by an H2.

Section 1 is divided into 3 "topics" ("Directors' reports", "Management comments" and "Remuneration report"), each headed by an H3. Lastly, each topic contains a number of H4 sub-headings. The start of the main content is illustrated in Figure 10 below.

PDF annual report page with headings H2, H3 and H4
Figure 10: Start of main content. Each topic is headed by an H3

Section 2, as might be expected, is also headed by an H2. However, unlike Section 1, this is a single topic section—that topic being "Statement of Internal Control". While this page does also have sub-headings, it has no equivalent of the H3 topic headings found earlier in the document. This is illustrated in Figure 11 below.

PDF annual report page with headings H2 and H4 only
Figure 11: Single topic main section

In shorthand, the heading structure for this document is as follows.

  • H1: Main document title
  • H2: Section 1—Annual Report to the Accounts
  • H3: Directors' reports
  • H4: Sub-sections (x 7)
  • H3: Management comments
  • H4: Sub-sections (x 7)
  • H3: Remuneration report
  • H4: Sub-sections (x 7)
  • H2: Section 2—Statement of Internal Control
  • H?: Sub-sections (x 4)

It can clearly be seen that all the sub-section headings do the same job throughout this document. As a result, all the same issues arise. You will either have to add an editorially unnecessary heading on page 7, or employ an inconsistent tag structure in order to adhere to the no-skipped-headings rule.

Not for visual styling purposes

For the avoidance of doubt, this example in no way constitutes determining heading structure according to visual styling considerations. The objective here is to avoid conveying the message that each two-paragraph section on page 7 does the same job as, or is structurally or editorially equivalent to the (approximately 2-page) "Directors' reports", "Management comments" or "Remuneration report" sections. At the risk of getting repetitive, they just don't and they aren't.

Conclusions

If PDF is different …

If PDF is different in a way that matters in the present context, it is because PDFs typically include secondary or even tertiary content, often residing in sidebars, front matter, case studies and so on. Because of this, if anything, PDF needs more, not less, flexibility than HTML.

Microsoft Word 2010 gets it right

The accessibility checker built in to Microsoft Word 2010 gets it right—it flags a skipped heading level with a warning (along the lines of: please check that you have skipped a heading level for a good reason). As stated near the top of this article, it is important to recognise that some documents are quite legitimately structured in such a way that indiscriminately and invariably following a strictly sequential pattern of headings will make for a poorer user experience than would otherwise be the case.

No case made

On the evidence presented to date, the case just has not been made that a blanket ban on skipped headings is justified. This matters because the no-skipped-headings rule in PDF/UA requires that automated PDF accessibility checkers include such a test. A document containing just one skipped heading will fail any such test. However, whether or not your heading structure is appropriate in the context of the document's content is something that absolutely requires human judgment. Like the appropriateness of bookmarks, document title, alt text, language specification and a host of other accessibility requirements, it just cannot be verified by a machine.

The solution

The solution really is simple. Follow the lead of the Microsoft Word 2010 accessibility checker and flag a skipped heading with a warning. Flagging it as an error, no matter what the circumstances, is a problem that undermines the credibility of PDF/UA.

Top

Ted Page Director PWS

PDF accessibility editing, auditing and training

We specialise in PDF accessibility training, PDF accessibility editing and PDF accessibility auditing.

ted@pws-ltd.com

Registered in England, number 065084100