Html, File paths and Metadata

Describing text to the browser, linking to files, and data about data

Feb 28, 2023

What is HTML?

HTML (HyperText Markup Language) is the language we use to describe text semantically (with meaning) to the web browser. Without it, the browser would have no clue that the above text is a heading, or the text below it are paragraphs.

HTML is written between opening and closing elements, or tags as some people call them. They look like this:

<h1>Hello World</h1>

It is worth noting that not all elements have closing tags. These are called self-closing tags. Though not part of the example, I included a label because we will return to this example later when we discuss accessibility and HTML attributes. Here is an example:

<label>this is a label</label> <input>

There is an older style of writing self-closing tags. In the modern version of HTML, HTML5 this is no longer used. I will show it to you anyway, just in case you encounter any legacy(old) code. Let’s use the same example from above:

<label>this is a label</label> <input/>

What can you do with HTML?

Some people consider HTML a declarative programming language, and while you can’t hack NASA with it you can make some beautiful-looking text documents.

Perhaps due to its simple structure, HTML is important but overlooked. Go ahead, and inspect your favorite website using the browser developer tools. I bet you will see a bunch of divs with ugly attributes.

Ever wonder why that is? That’s because when Web Developers learn how to make websites, most rush through HTML thinking it is so trivial. That is why the HTML for most websites is a nightmare for screen readers and Web Developers alike to work with.

It seems most Web Developers would rather spend their time mastering vanilla JavaScript or CSS(or more likely, the respective frameworks), because HTML is so easy, right? WRONG!

This is the HTML for a typical substack post. Did a human write this? Maybe, either by hand or programmatically! A framework generates some HTML, like Bootstrap. While it can shorten development time, it can make extending the code difficult due to messy, unsemantic HTML and weird class names.

OK. Now that we have seen what bad HTML looks like, let’s check out the beauty of semantic, good HTML.

The best HTML is semantic

We saw an example of semantic HTML above with the input and label example. Now let’s extend it with another modern element, the section element.

<section>

<h1>this is a section title</h1>

<label>this is a label</label> <input>

<section/>

Here is how Mozilla Developer Network( MDN), a wonderful reference for HTML defines the section element

The <section> HTML element represents a generic standalone section of a document, which doesn't have a more specific semantic element to represent it. Sections should always have a heading, with very few exceptions.

In my own work, I often use the section tag to group related content together. I am keeping these examples simple since this is a discussion about semantic HTML and we haven’t discussed HTML attributes yet. In the real world, the label and input would be associated with a form tag, and a button tag would be used to submit the form.

Remember the example earlier of the HTML from Substack(the nested divs)? Compare that to Semantic HTML. HTML that is semantic isn’t just easier to write and read, It shows empathy for your users as we will soon see with screen readers. Semantic HTML is also a good business practice.

Websites with semantic HTML and proper metadata are ranked higher in search engines, as the Google web crawlers (the software google uses to scan websites and determine SEO ranking) prioritize websites that are easy to crawl. Think of it as Google’s way of saying You scratch my back and I’ll scratch yours.

Making sure your HTML is valid

To make mistakes is to be human. But your mistakes shouldn’t cause your users stress or your client’s business opportunities. It isn’t all that uncommon for a developer to forget to close a HTML tag

<p>hello world, will you complete me?

<h2>the paragraph above is unclosed, and may render in unexpected ways</h2>

Not closing tags can lead to improperly rendered pages, and create unexpected blank spaces between elements in the markup. Not only that, but when it comes to styling the HTML with Cascading Style Sheets (CSS), styles can spill out into adjacent HTML. Please note that different browsers display HTML differently, so the behavior isn’t the same across the board.

This is because HTML elements are containers for text content, and when you don’t specify to the browser that the element(or tag, I know I use these terms interchangeably) has closed, the browser has no way of knowing!

This is where a tool like an HTML validator comes in handy. There are many, but this one is made by W3C, an organization responsible for the standards of the World Wide Web and HTML5.

When using the validator, you can upload an HTML file, or paste an HTML snippet such as the one above. And that’s exactly what I did. Let’s see what it had to say

1 warning and 2 errors from an HTML validator

After validating I saw that there were 1 warning and 2 errors. Though I couldn’t find research to support this, I believe errors are more serious(and harder to recover from) than warnings. Luckily for me, the validator told us how to fix the HTML, so let’s go ahead and do that.

A basic HTML 5 document showing proper validation — A basic HTML5 document showing proper validation.

Along with fixing the errors, I made the HTML snippet a full HTML document. Here it is. **Note, as you may have noticed, I forgot to close the <p> tag, and the validator still accepted it! If you know why, tell me in the comments!**

<!DOCTYPE html>

<html lang="en">

<head>

<meta charset="utf-8">

<meta name="viewport" content="width=device-width, initial-scale=1">

<title>my HTML document</title>

</head>

<body>

<p>hello world</p>

<h2>the paragraph above is now closed</h2>

</body>

</html>

Alright, that’s enough HTML talk. Now let’s discuss attributes.

`HTML` and attributes

HTML attributes are just another way to give control over how elements act. There are a lot of them, and you don’t need to know them all. Web Development is a lot like shopping, picking and choosing what you need for the right occasion and knowing which elements and attributes best serve your audience.

Now, back to our example. Remember the label and input elements from earlier? Let’s add attributes to them

<label for=”user-name”>this is a label</label> <input type=”text” name=”user-name” id=”user-name” placeholder=”type your name”>

The for attribute is used to associate the label with the input. This helps screen readers know the user-name label is associated with the user-name input.
The type attribute has a value of text.This tells the browser that input is a text box.
The name attribute has a value of user-name. This is the name of the element used on the server when the data from the input gets submitted.
The id global attribute(can be used on all elements) is a unique value that allows for manipulation by JavaScript and CSS. The value must be an alphanumeric string(letters and numbers). You can read more about the id here.
The placeholder attribute tells the user what information is expected of them.

`HTML` and accessibility

As mentioned earlier, HTML is inherently accessible when used properly. There are certain elements that can be interacted with/navigated to(and brought into focus) with the keyboard only. A few of them are:

button
a
input
IFrame
Any element with tabindex=”0”.
- You can read more about tabindex on MDN.

To increase accessibility, you can use an attribute called an aria-label. The HTML aria-label attribute is used to describe the purpose or function of an interactive element to assistive technology such as screen readers.

Below I’ve included an example of using an aria-label to provide a description of an img tag with an empty alt attribute. The alt attribute provides a description to screen readers and visual text when the image doesn’t load.

<img src="cute-cat.jpg" alt="" aria-label="a fluffy siberian cat with green eyes”>

In this case, since the alt is empty, the screen reader would read the aria-label aloud, and visual feedback is provided in case the image doesn’t load.

To learn more about screen readers, what they sound like when reading HTML, and how the aria-label attribute contributes to accessibility. Check out this video by google developers.

`HTML` metadata and the `meta` tag

If an HTML document tells the browser how to markup the data(text), then metadata describes that data. This is done through the use of the meta tag. This tag is located (nested) inside the head tag

Let’s discuss Open Graph(OG) metadata. Part of Facebook’s Open graph protocol, when you post a link to a website, for example on Twitter or LinkedIn, information about the website you linked is also displayed. It is generated at least partially from the OG meta tags in the HTML. Here are some meta tags with attributes, including some OG ones:

<meta charset="utf-8">
- Specifies the character encoding of the document as UTF-8(Unicode Transformation Format 8-bit). This ensures that regardless of the user’s operating system or language settings, text will be rendered clearly. UTF is beyond the scope of this article, though you can read more about UTF here.
<meta name="viewport" content="width=device-width, initial-scale=1">
- Allows the website to adjust its layout to fit the screen size of the device. Sets the initial zoom level of the page to 100%.
<meta name="description" content="…">
- Defines a description of the content of your web page. This is often displayed in search results. The content attribute should contain the actual description and should be no longer than a couple sentences long.
<meta property="og:image" content="path-to-image">
- Specifies the path (absolute or relative, terms to be defined below) for the thumbnail preview image for social media, i.e Facebook, Twitter, and LinkedIn.
<meta name="keywords" content="…">
- Define a comma-separated list of keywords for search engines that represent the content of the page.
<meta name="author" content="…">
- Specifies the author of a page. The content attribute should be set to the name or email address of the author.

Though some of these tags are meant for display on social media, there is no need to test them on twitter (or facebook, discord, slack… or anyplace that reads metadata from HTML ).

You can preview to some degree of certainty how the metadata will display with a tool like this. I included a screenshot below. Note that some websites will probably display it differently.

A preview of how the different meta tags display information — Using Perpetual Education as an example, a preview of how the different meta tags display information

An important note on properly sizing images for metadata. When preparing images for OG metadata use, make sure images are atleast 1200 X 630 pixels (width X height). To help you get the right size, you can use this free tool.

Understanding file paths

6 spaced circles between two bars, floating above 3 spaced bars and a square — A file path represented as abstract shapes. Source: author

File paths are how we link resources. As I discussed in my previous blog post, the World Wide Web is basically a bunch of files linking to each other. We use file paths to achieve this. This is done through two types of file paths:

Relative file paths- A resource that links to another resource relative to the current file. Relative file paths are typically used to link to resources such as images, stylesheets, and scripts that are located in the same directory(folder) or subdirectory(a folder within a folder) as the current web page. These file paths would be used in conjunction with href. So, href = “path-to-file”.
Let us discuss the relative file path syntax using a couple examples
- “./index.html” - This(./) refers to a file named “./index.html” in the current working directory.
  - Alternatively, (I believe) you could just say “index.html” or “/index.html”. If I am wrong just roast me in the comments!
- “../parent-directory/cute-cat.jpg” - This refers to a file named “cute-cat.jpg” in a directory called “parent-directory” that is one level up from the current working directory.
  - Look at the previous example. If “.” represents the current directory, then “..” is two directories up from the current directory.
- “../../parent-directory/grandparent/index.html” - This path represents a file named “index.html” in a directory called “grandparent” that is two levels up from the current working directory, and then inside a directory called “parent-directory”.
  - If you feel lost, look at the previous example. if “../” represents a file one level up from the current directory, then “../../” is a file two levels up.
    - relative file paths trip up even experienced developers, so try not to feel too bad if you don’t fully understand. This post will be updated in the future with better examples and diagrams.
Absolute file paths- Absolute file paths are used to link to other resources, usually (but not always, sometimes on the current server, as you can see if you look at the substack URL for example) on a different website or server. These files can be other web pages, images, or other types of files. They are prefixed with
- The protocol, like http:// or https:// .
- The domain name or IP address of the server where the resource is located.
- The path to the resource.
Let’s use the Perpetual Education website as an example of an absolute file path.
https://perpetual.education/blog/
Recognize the pattern? This link points to a page called blog on the domain name perpetual.education using the https:// protocol.
- Here’s what it looks like in practice, using an a tag and href attribute
  - <a href=“https://perpetual.education/blog/”>blog</a>
    - Note: anything can go between the opening and closing tags, but it is best to make the text relevant to the linked page.

From HTML, attributes and metadata we covered a lot in this blog post. If I got anything wrong or you have any questions, write a comment!

Marco’s Design Journey