Lab 1
Introduction to the Web

DSC 106: Introduction to Data Visualization

These labs dogfood

All slides and lab materials using open web technologies.

Lots of additional informaton in the notes, lots of outgoing links.

How does the Web work?

The URL

The Web’s primary innovation

https:// dsc106.com /labs/1/hello.html
What many people don’t realize is that the Web was primarily a usability innovation, and the URL was the centerpiece of that. Before the Web, we still had computers connected to the internet, that we could access remotely. However, accessing resources involved several steps: connecting to the remote computer, navigating to the directory where the resource was, and then downloading it. The URL encoded all information to retrieve a resource into a single string that could be copied, pasted, and even clicked. Let’s take a look at an example URL to a file on our course website. It starts with the protocol, which is the language that the client and server will use to communicate. This is usually `https` nowadays, which is a secure version of the `http` protocol. Then, we have the host or authority, which is the computer that hosts the resource we want to access. Finally, we have the path to the resource, which is the location of the resource on the host.

Relative URLs

Result:
When specifying URLs for links or resources within a website, we typically don’t want to repeat things like the protocol and host on every single one. Not only is this tedious and adds bloat, it also adds a lot of work if we ever want to change the domain name and makes it hard to test sites locally. Instead, we can use *relative URLs*, which are URLs that are relative to another URL. - Within HTML, relative URLs are interpreted relative to the URL of the current page. - Within CSS, relative URLs are interpreted relative to the URL of the CSS file. - Within JS, relative URLs are interpreted relative to the URL of the JS file. In most cases relative URLs do what you’d expect, but there are a few things to keep in mind: - If a relative URL starts with a `/`, it is relative to the root of the host. - `..` means “go up one level”. - `.` refers to the current directory. Read more: [Relative URLs on MDN](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/Web_mechanics/What_is_a_URL#absolute_urls_vs._relative_urls)

Local URLs Part 1

The file: protocol

file:// /Users/giorgianicolaou/Documents/dsc106/labs/1/hello.html
Let’s look at a different example. Download the file `hello.html` from the course website and double click it to get it to open in your browser. You will probably see something similar to this. This allows you to open files from your local filesystem and view them in a browser. Note that there is no host (since `file:` protocol URLs always refer to your own computer) and that the path starts from the root of your filesystem. This means that opening a malicious website this way could potentially access any file on your computer, which is why the `file:` protocol is very locked down today as it’s considered unsafe. We will use the `file:` protocol for now, but in future labs, we will learn how to run a _local server_ to view our files, which is a lot more flexible. Read more: - ["What is a URL?" on MDN](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/Web_mechanics/What_is_a_URL)

Computers connected to the Web are clients, servers, or both

Clients & Servers

- A web server runs software that listens for requests for resources and responds with said resources (or an error) - Clients are the computers making the requests - The same computer can be both a client and a server
Client

					GET /labs/1/hello.html HTTP/1.1
					User-Agent: Mozilla/5.0 Firefox/71.0
					Host: dsc160.com
					Accept-Language: en-us
					Accept-Encoding: gzip, deflate
					Connection: Keep-Alive
				
đź–Ą
Server

					HTTP/1.1 200 OK
					Date: Sun, 29 Dec 2019 17:46:48 GMT
					Server: Apache/2.4.7 (Ubuntu)
					Last-Modified: Wed, 15 Nov 2017 21:16:50 GMT
					Content-Length: 6471
					Keep-Alive: timeout=5, max=100
					Connection: Keep-Alive
					Content-Type: text/html

					<!DOCTYPE html>
					<title>Hello world</title>
					<h1>Hello world!</h1>
					<p>This is my <em>first</em> web page! 🤯</p>
				
What happens when we enter a URL and hit Enter or click a link? 1. First, the browser parses URL and sends an *HTTP request* to the *web server* 2. The Web server reads that and sends back an appropriate *HTTP response*, which contains metadata about the resource plus content. For webpage URLs, this content is usually in a language called HTML. That HTML may be read from a static `.html` file that the server was hosting, or even generateed on the fly via a programming language running on the server. 3. The browser parses this response and starts rendering the page, firing of more requests as needed Read more: - ["What is a web server?" on MDN](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/Web_mechanics/What_is_a_web_server)

The Web: Open by design

The Web was designed to be open: you can view the code of any website, and even modify it on the fly! There are two browser features for this: - _View Source_ shows you the HTML code of the page, as it was sent by the server. To use this you can right click on the page and select _View Source_, or press `Ctrl+Shift+U` or `Cmd+Shift+U`. - _Dev Tools_ allow you to inspect the current status of all code being used to create the page, and even modify it on the fly. You can also view all HTTP requests and responses, and even simulate different network conditions. Note that this requires reloading the page. To open dev tools: - On Mac: Press `Cmd+Shift+I` - On Windows: `F12` or `Ctrl+Shift+I` on Windows - From the menu bar: View → Developer → Developer Tools - In many browsers you can also right click anywhere on the page and select _Inspect Element_ Note that this can be a double edged sword: while it can be very educational to look at how existing websites are built, there are many, *many* websites that are not following best practices, or do things in a suboptimal way. Often this is a result of their age, or the tradeoffs they had to make (e.g. performance or browser support over maintainability).

The Web Platform

HTML

Structure


				<h1>Hello world!</h1>
				<p>This is my
					<em>first</em> web page! 🤯
				</p>
			

CSS

Presentation


				body {
					font: 100%/1.5 system-ui;
				}

				h1 {
					color: deeppink;
					font-size: 300%;
				}
			

JS

Behavior


				document.onclick = event => {
					alert(`You clicked at
					${event.x}, ${event.y}`);
				};
			
The collection of technologies that we use to create websites and web applications is called [The Web Platform](https://en.wikipedia.org/wiki/Web_platform#:~:text=The%20Web%20platform%20is%20a,Task%20Force%2C%20and%20Ecma%20International). It consists of three main technologies: HTML, CSS, and JavaScript. - HTML is used to specify the structure of the content, for example headings, paragraphs, lists, etc. - CSS is used to specify how content *looks*, for example colors, fonts, spacing, layout, etc. CSS will be the focus of the next lab, so we will learn a lot more about it then. - Finally, JavaScript is used to specify how content *behaves*, for example what happens when you click a button, or when you scroll the page, as well as for automation. - Ideally, these should be designed to be as independent as possible (separation of concerns). [CSS Zen Garden](https://csszengarden.com/) was built to demonstrate this point by showing how the same HTML can be styled in completely different ways by different CSS files, and significantly contributed to CSS’s adoption at the time. You can view a web page with the HTML, CSS, and JS from this slide at [`labs/1/first.html`](../first.html)

Similarties & differences

<h1> Hello world! </h1>
Webpages consist of *HTML elements*, each of which is a unit of meaning. Syntactically, HTML elements are made up of *tags*, which are sequences of characters enclosed in angle brackets. The HTML between the starting and ending tag is called the *content* of the element, and may contain raw text or other HTML elements. Many HTML elements do come with some default styling, for example a heading element will be rendered in bold and a larger font size. However, it is an antipattern to use HTML elements to style content, that’s what CSS is for!

			<img src="images/baby-yoda.jpg"
				alt="Baby Yoda with a serious expression" />
		
- We can also specify metadata on each element, by specifying *attributes* on the starting tag, which are key-value pairs. Some attributes that only turn certain things on/off do not take a value. - Some elements do not take any content, only attributes. - Some of these elements do not need a closing tag, but not all. For readability, we can include a traling slash to indicate that the element does not have a closing tag (`<img />`) but this is not required. We will use that convention so that you can recognize these elements more easily. Here we see an example of the `<img>` element, which is used to embed images in a webpage. The `src` attribute specifies the URL of the image to embed, while the `alt` attribute specifies a text description of the image for people who are visually impaired and rely on screen readers to browse the web. Note that `<img>` is one of the many HTML elements that cause the browser to fire *additional* HTTP requests.

HTML at the root of it all

CSS


				<link rel="stylesheet" href="style.css" />
			

				<style>
					h1 {
						color: deeppink;
					}
				</style>
			

JS


				<script type="module" src="hello.js"></script>
			

				<script type="module">
					alert("Hello world!");
				</script>
			
- Every webpage consists of HTML. It is the HTML that includes all other langagues. - CSS and JS can be either linked from separate files or embedded in the HTML file itself. - Embedding them can be useful for prototyping, but it’s generally better to keep them separate so they can be used by multiple pages and cached separately. - Note that it is possible to have `<script>` elements without `type="module"`, but we recommend you always use `type="module"` as it enables a few modern JS features and prevents certain errors.

Each language imports subresources

CSS


				@import url("colors.css");
			

				<style>
					@import url("style.css");
				</style>
			

JS


				import * as util from "utils.js";
			

				<script type="module">
					import "hello.js";
				</script>
			

			h1 {
				color: deeppink;
			}
		
CSS is largely a collection of *rules*, each of which consists of a *selector* and one or more *declarations*. - The _selector_ specifies which elements the rule applies to. The simplest selector is the one shown here that applies to all elements of a certain type, in this case `h1` elements. However, CSS selectors go a lot deeper than that, and can query elements based on attributes, relationships to other elements, user activity, and more. - _Declarations_ are key-value pairs, where the key is the _property_ and the value is the _value_ of that property. Each declaration sets one aspect of the style of the element. - Yes, it is possible for declarations from multiple rules (or even the same rule) to conflict. There is an entire conflict resolution algorithm to determine which one wins in these cases, called The Cascade, which is where CSS gets its name from (Cascading Style Sheets). We will cover this briefly in the next lab.

Error handling

HTML is forgiving

It tries to fix your mistakes


				<em>Emphasized
					<strong>Both
				</em> Important</strong>
			

				<em>Emphasized
					<strong>Both</strong>
				</em>
				<strong>Important</strong>
			

CSS is forgiving

It ignores your mistakes


				h1 {
					color: slategray;
					foobar: yolo;
				}
			

				h1 {
					color: slategray;
				}
			

JS is strict

It stops at the first mistake


				alert("Hello world!"
			
Uncaught SyntaxError: missing ) after argument list
In fact, this forgiving nature of HTML and CSS is core to the Web’s success. Attepmpts to define a stricter version of HTML (XHTML) failed miserably. This is also one reason why it’s generally a bad idea if your webpage depends on JS for essential content.

Dev Tools & Errors

Emmet’s HTML skeleton ! Tab


			<!DOCTYPE html>
			<html lang="en">
			<head>
				<meta charset="UTF-8">
				<meta name="viewport" content="width=device-width, initial-scale=1.0">
				<title>Document</title>
			</head>
			<body>

			</body>
			</html>
		
You have seen in the lab 0 that Emmet creates a basic HTML page skeleton for you when you press `!` and then `Tab`. But what do all these elements do? Do we need them? - `<!DOCTYPE html>` is the *document type declaration*, which tells the browser that this is a modern HTML document. It is not actually an HTML element. - `<html>` is the root element of the document, and contains all other elements. - The `lang` attribute specifies the language of the content, which is useful for screen readers and other accessibility tools, as well as several web platform features (e.g. hyphenation). It’s available on every element, not just the root. - [`<head>`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/head) contains metadata about the page, such as the character encoding, the page title, and more. It is not displayed on the page. - The [viewport meta tag](https://developer.mozilla.org/en-US/docs/Web/HTML/Viewport_meta_tag) controls parameters for mobile rendering and also tells the browser not to apply the kinds of heuristics it uses for pages not optimized for mobile. - `<title>` is the title of the page, which is shown in the browser tab. It is the only element that is actually required (corollary: `<DOCTYPE html><title>Hey</title>` is a [valid](https://validator.w3.org/nu/#textarea) HTML document). - `<body>` contains the visible content of the page. - While technically `<html>`, `<head>`, and `<body>` are optional, they are automatically inserted by the browser if not specified (as you can verify yourself by inspecting [`hello.html`](../hello.html)).

HTML elements

## Markup / Content - Headings (`<h1>`, `<h2>`, `<h3>`, `<h4>`, `<h5>`, `<h6>`) - Paragraphs (`<p>`) - Lists (`<ul>`, `<ol>`, `<li>`, etc.) - Tables (`<table>`, `<thead>`, `<tr>`, `<td>`, etc.) - Sectioning (`<header>`, `<footer>`, `<main>`, `<section>`, `<article>`, `<nav>` etc.) - Inline markup (`<em>`, `<strong>`, `<code>` etc.) - Figures (`<figure>`, `<figcaption>`)
## Interactive - Forms (`<form>`, `<input>`, `<textarea>`, `<button>`, etc.) - Links (`<a>`) - Progressive disclosure (`<details>`, `<dialog>`) ## Resources / Embedding - Multimedia (`<img>`, `<video>`, `<audio>`, `<svg>`, etc.) - External resources (`<link>`, `<script>`, etc.) - Embeds (`<iframe>`, `<object>`, etc.)
- The purpose of markup elements is to mark up content, usually by assigning semantics. - Interactive elements on the other hand provide functionality. [Full list of HTML elements](https://developer.mozilla.org/en-US/docs/Web/HTML/Element)