Application Layer

All of the code illustrations on this page are written in pseudocode.

Under the OSI model, the application layer consists of all the user-facing network programs we're familiar with: email, web browsing, online gaming, teleconferencing, and so on. This chapter provides an overview of the application layer.

Application Architectures

Network applications can be classified in terms of their architecture. The three most common architectures are client-server, peer-to-peer (P2P), and hybrid (a combination of client-server and P2P).

Client-server Applications

Under the client-server architecture, applications are built with the following assumptions:

  • There are two systems, a server and a client.
  • The network program lives in the client.
  • To get data from another system on the network, the client sends requests to the server, which then informs the relevant system of the request. If the relevant system decides to transmit the data, it responds to the server with the data, which the server then transmits to the client.
  • The host is always on, and it has a non-changing IP address.
  • When the host receives a request it returns a reply. That reply may consist of data, or a message indicating failure.

Many of the most widely-used network applications follow the client-server architecture: web browsing, media streaming, email, and so on.

P2P Applications

Under the P2P architecture, applications are built with the following assumptions:

  • All nodes on the network are called peers.
  • To get data from another system, a system directly asks the relevant system.

P2P applications are highly scalable, but very difficult to manage. Classic examples of P2P applications include Bitorrent and the former Limewire. P2P applications are often further classified as either pure P2P and impure P2P. This classification stems from how P2P applications address the issue of unavailability.

The problem of unavailability is best revealed by way of an example. Suppose A, B, and C are three separate systems, each with data the others want: A has math textbook pdfs, B has mp3 files of some math podcasts, and C has some mp4 videos of math lectures. The three systems decide to form a network where they can each freely transfer data among one another without restriction. These transfers are made by a small program the three systems wrote called Lemonwire, installed on each system.

The Lemonwire program is simply a folder with three subfolders: A, B, C. The A folder contains A's pdfs, the B folder contains B's mp3s, and the C folder contains C's mp4s. As long as the system hosting the data is turned on, the other systems can look inside Lemonwire and access their data. All together, the Lemonwire program is considered a pure P2P application.

All goes well, until one day, A suddenly goes offline for maintenance. Weeks go by, and A still hasn't come back. During that time, B and C have no access to books. When A returns, the three systems decide to have a meeting. They agree that the weeks without books was unacceptable, and that they need a way to ensure it doesn't happen. Their solution: We'll set up a fourth system D that (1) all of us have access to, and (2) contains all of our data. We'll then rewrite Lemonwire to contain only a single folder D. With this change, Lemonwire is now an impure P2P application.

Things go smoothly, and both B and C take their much-needed repair vacations. Then one day, D breaks down. Now none of them have books, music, or movies. Once again they call a round table, and come up with another solution: Let's get more systems on our network. Because they'll be accomodating more systems, A, B, and C rewrite Lemonwire: If a system wants some data, they send a request for the data to the systems they're linked to. If a system on the network has the data requested, it responds with the data.

A, B, and C, however, don't have the money to hook all of these systems to their network. So, they decide to hook on to the Internet. They then upload Lemonwire on Github, and advertise it as "the math knowledge network." Now they're back to a pure P2P application.

Lemonwire works well, but it's fundamentally limited: They can only get data back from a system that's agreed to grant access. For example, let's say Lemonwire's network grows steadily, becoming the graph:

In the graph above, A has no connection to G. There's no way for A to get G's data, and there's no way for G to get A's data. Why? Because A doesn't know G's IP address. With just A, B, and C, it was simple enough, because they all had links to one another, and it was easy to keep track of each other's IP address. But Lemonwire has changed. The nodes above are linked through the Internet, and their IP addresses are constantly changing. At this point, A, B, and C must now balance the tradeoffs of network topologies.

One solution to this problem is by having one node that's connected to all of the nodes (a star topology). This makes the architecture look more like the client-server model, but not quite (the nodes can still talk to another directly). But if this approach is taken, Lemonwire runs into all of the problems we saw with the client-server model: Heavy traffic to a single node, and potentially disastrous consequences if that well-connected node goes offline.

Another solution is to have all of the nodes connected to one another (a mesh topology). This achieves robustness, but it's unlikely that A, B, and C have the money and human resources to implement and maintain that kind network.

Yet another solution is to change the way requests are sent. In all of the preceding approaches, A, B, and C, were directly sending requests to one another. Instead of this method, they could just flood the network with the request. That is, the request is sent to every single node on the network: A sends it to B and C, and B and C immediately send the request to all of their neighbors. There are several problems with this approach. The first is security. Perhaps A wants a file about physics, but doesn't want others on the network to know he's interested in applied mathematics. The second problem is processing waste. With flooding, the nodes forwarding the request have no idea whether the sender got what they requested for. A could have received the data they requested 10 minutes ago, but nodes elsewhere are still forwarding the request further. The classic example of a flooding application is Gnutella.

One way to get around these costs is through distributed hashing. The idea here is, whenever a new node joins the network, it submits a list of all the files it can share, along with its IP address. That information is placed in a table called a distributed hash table, which every node on the network holds. If a node wants a particular file, it searches inside its distributed hash table. If it finds the file, it gets the hosting system's IP address, and sends a request there. The problem with this approach: That hash table is potentially too large to fit on a consumer system like a laptop.

Because of this memory constraint, the distributed hashing approach is instead implemented statistically: Each new node still provides a list of the files it can share, but only some of the nodes on the network actually hold the hash table (perhaps a large data server somewhere). When a node on the network wants certain data, it searches the hash table hosted on the data server. The costs to this approach: It's extremely tricky to implement and must be constantly monitored. The servers have to be spread out in such a way that the packets comprising the search query don't take too long. One of the first distributed hashing applications was Napster. Its success and expansive network cemented distributed hashing's place as a popular implementation today, as well as sparking the U.S. government's interest in regulating online piracy.

Hybrid Applications

Some applications are implemented with a hybrid architecture: A combination of client-server and P2P. Examples of hybrid applications include Skype, Internet Relay Chats (IRC), and various Internet telephony applications.

With the hybrid architecture, there's a single server that keeps track of which nodes are on the network. If a node A wants data from a node B on the network, it sends a request to the server. Upon receiving the request, the server checks if B is connected to the network. If B is connected, the server establishes a direct connection between A and B. Otherwise, no connection is made, and A is out of luck.

Processes

As we mentioned several sections ago, when we say that a node communicates to another node, what we really mean is that a process one system communicates to a process on another system. A process is a program running within a system — Chrome, Firefox, Spotify, Windows Update, the Apple App Store, etc.

When a process communicates with a process in the same system, we call that communication an interprocess communication. For example, if we download a PDF file from Chrome and Adobe Acrobat opens after a complete download, we have an interprocess communication between two programs on the same system. Namely, Chrome and Adobe Acrobat. The link between these two programs is the operating system.

When a process communicates with a process on a different system, we call that communication a message. For example, if we enter cnn.com in Safari and an HTML file loads, there is a message transmitted between our laptop and the server that hosts the HTML file. In this case, there are two processes communicating with one another: a client process (the tab open in Safari) and a server process (perhaps some Apache program that's constantly waiting for, and processing requests).

Sockets

As we know, we can write programs that take user input. We can also write write programs that send data to another program. Some parts of the program only take input. These are called entry points. Other parts of the program only return outputs, and we call these exit points. However, there may also be parts of the program where data can enter and exit. We call these access points.

Network applications use a special kind of access point called a socket. Sockets are characterized by the use of a particular protocol — at the application layer, a set of rules and procedures that impose requirements for entry and exit.

One such requirement is addressing. Data can only get to a process if the operating system knows which process to send it to. And data can only get to an operating system if the sending node knows which system to send it to. And data can only get to the sending node if the router knows which router to send it to. Thus, we can only transmit data to and from different systems if we have a way to address all these different entities. This is where protocols come in.

To identify a process, we give each process a port number. Once we give a process that port number, we say that the process listens at that port number. For example, an HTTP server process (e.g., the program that processed our CNN.com request) listens at port number 80. A mail server process (e.g., the Gmail program that process our sent and received emails), listens at port number 25. Which numbers go to which process is determined by an addressing protocol.

Addressing protocols take care of locating. But there's also the issue of message formatting. Becauses processes live on potentially entirely different systems with entirely different implementations, it's unlikely that the raw data coming out of a sending process can be understood by the receiving the process. RFCs (Requests for Comments) provide a common format by establishing rules on the following points:

  1. What are the types of messages exchanged? E.g., request and response.
  2. What is the message syntax? I.e., what fields must the data contain, and how are they delineated?
  3. What are the message semantics? I.e., how should the receiver/sender interpret the data in those fields.
  4. What do I do if x{x} happens? These are rules about how a process should request and respond to messages.

Examples of RFCs include HTTP and SMTP (more on this later). We have rules for locating and rules for formatting. What else do we need? Well, we also need rules for message transport.

Network programs are built to provide some sort of network application, and those applications vary. Some applications are tolerant of small amounts of data loss (e.g., music streaming; there's only so much the human ear can pick up). Other applications have zero tolerance (e.g., file sharing). Some applications are tolerant of delays (e.g., emailing), others are intolerant (e.g., live online gaming).

A third parameter separate from loss and timing is bandwidth. Some applications use very little bandwidth (Internet telephony), and others use whatever bandwidth they can get their hands on (often called elastic applications; think Dropbox and OneDrive). As we've seen repeatedly, data loss and timing are often tradeoffs of one another. Bandwidth, however, is an entirely separate matter. To illustrate, consider the following scenario:

Devin, a TikTok addict, is looking for an internet service provider. Two companies, A and B approach her. A says, "We can give you 1000 bits per second." B says, "We can give you 1 millisecond packet latency." Should Devin go with A or B?

Devin should go with B. Why? Because B's offer is a much stronger guarantee than A's. A guarantee of 1 millisecond packet latency automatically implies that, when Devin receives data, she will get all 1000 bits in one second. I.e., 1000 bits per second. A's guarantee of 100 bits per second, however, does not make nearly as strong of a guarantee that all 100 bits will arrive in one second.

Why? Let's consider an alternative hypothetical. Developer Devin is comparing two different, somewhat unusual, job offers from two companies, X and Y. X says, "We provide 1 vacation per year, as long as the vacations are 1 year apart." Y says, "We provide 10 vacations for 10 years." These are two different statements. One is "1 vacation per year", the other is "10 vacations for 10 years." Setting aside potential labor law issues, Company Y might require Devin to work for a little over 9 years without seeing a vacation. Company X, however, guarantees that Devin will get 1 vacation for each year.

The same idea applies to the two ISP offers. B's offer guarantees that Devin will get 1000 packets every second. For applications like video sharing, this is desirable. The video plays as bits come in. A's offer, however, does not make nearly as strong of a guarantee — it could be the case that 1000 bits arrive at the very last second, and the videos Devin wants to watch must buffer before they start playing.

Below is a comparison of some common applications:

ApplicationTolerates Data Loss?BandwidthTolerates Delays?
file transfernoelasticyes
emailnoelasticyes
static websitenoelasticyes
real-time audio/videoyes5kbps-1Mbpsvery little (100s of milliseconds)
stored audio/videoyes5kbps-1Mbpsvery little (few seconds because of buffering)
gamingyesfew kbps upstreamvery little (100s milliseconds)
instant messagingnoelasticyes and no (depends on the users; some are fine with slow replies)

A developer can ensure their application's requirements are met by ensuring that its dependencies conform to a particular transport protocol. The two most common being TCP and UDP:

ApplicationApplication Layer ProtocolTransport Protocol
emailSMTPTCP
remote terminal accessTelnetTCP
web browsingHTTPTCP
file transferFTPTCP
video/sound streamingproprietary (e.g., RealNetworks)TCP or UDP
Internet telephonyproprietary (e.g., Vonage, Dialpad)TCP or UDP

Now that we have a general idea of how the application layer works, let's look at some of the most common web applications.

Web Browsing

To help our discussion, let's define a few terms upfront. A web page is a collection of data objects. Crudely, we can think of it like a struct in C. The data objects inside this struct are references (weak pointers) to data. What we're analogizing to as a struct is a base HTML file.

index.html {
	text,
	image,
	text,
	image,
	video,
	audio,
	javascript code
}

Each object in the web page can be identified with an address, called its URL.

www.foobar.com/hello.jpeg

In the URL above, www.foobar.com is the webpage's host name, and everything thereafter is called the path name. The applications that request, receive, and display web page objects are called web browsers. The applications that send these objects in response to the browser requests are called web servers. Below are some examples.

Web BrowsersWeb Servers
Chrome, Firefox, Safari, Opera, Vivaldi, Min, Edge, Internet ExplorerNginx, Apache Server, Apache Tomcat, Cloudflare Server, LiteSpeed, OpenResty, Google Server, Microsoft-IIS

Both web browsers and web servers follow the HTTP protocol in communicating with one another. The requests for data sent by the browser are called HTTP requests, and the responds sent by the web server are called HTTP responses.

The HTTP protocol is built on top of the TCP protocol. To establish communication, a process on the client system first creates a socket. This socket will serve as the access point for the data to be transferred. Next, the process initiates a TCP connection to the server. This is done by sending a request to establish the connection. If the server accepts, the connection is established, and HTTP messages are then exchanged between the two processes. Once the data transmission concludes, the TCP connection is closed.

Under the HTTP protocol, the server does not store data about past client requests. Thus, serves that follow the HTTP protocol (called HTTP servers) are often characterized as stateless. Even if a client process P{P} asks for weather.com at 8 A.M. every day, to the server, each request appears as if the user is asking for weather.com for the first time. In fact, it runs even deeper than that — it's as if the server is encountering P{P} for the very first time.

Why are they stateless? This question is best answered by considering what might happen if they weren't stateless. I.e., if they were stateful. A stateful server would be incredibly complicated. It needs sophisticated garbage collection to help manage its memory usage, methods of ensuring the client and itself are on the same page in the event it crashes, and methods for balancing its resources between processing requests and managing state. All but the most resource-laden applications can afford to implement and maintain stateful servers.

HTTP Connections

There are two types of HTTP connections: (1) nonpersistent HTTP and (2) persistent HTTP. Nonpersistent HTTP is governed by HTTP/1.0 protocol, and persistent HTTP is governed by HTTP/1.1. Most connections today require HTTP/1.1 complicance, but, as with all web-related applications, there are still systems that use HTTP/1.0. As long as there are end users whose machines rely on HTTP/1.0, there will be web servers that maintain HTTP/1.0 compatibility.1

Nonpersistent HTTP

Initially, all HTTP connections were nonpersistent — at most, only one object could be set over a TCP connection. Because of this limitation, the procedure for transmitting data looked something like the following:

  1. The user enters a URL into their browser tab (www.site.com/foo/index.html).
  2. The HTTP client initiates a TCP connection with the HTTP server at the system named www.site.com. Specifically, at a port on that system numbered 80.
  3. The HTTP server at www.site.com accepts the connection and notifies the client.
  4. The HTTP client sends an HTTP request (containing the URL) through its TCP connection socket. This message contains the information: "Give me www.site.com/foo/index.html."
  5. The HTTP server receives the message and forms an HTTP response containing www.site.com/foo/index.html. It then sends the message through its TCP connection socket.
  6. Once the request has been served (the server finishes doing all its been asked to do), the server closes the TCP connection.
  7. The HTTP client receives the response message, and begins parsing the HTML file.
  8. During that parsing process, it might encounter more objects — jpeg files, JavaScript files, PHP files, MP3 files, MP4 files, and so on. So, the HTTP client repeats initiates another TCP connection, and the cycle repeats.

The time it takes to send a packet from client to server and back is called round trip time (Tround{\T_{\text{round}}}). The response time (Tresponse{\T_{\text{response}}}) is the time it takes for the server to respond to a request. The time it takes for the server to prepare the file for transmission is called the file transmit time (Tfile{\T_{\text{file}}}). Computationally:

Tresponse=2Tround+Tfile \T_{\text{response}} = 2\T_{\text{round}} + \T_{\text{file}}

Notice that if we had five objects to download, we'd get Tresponse×5,{\T_{\text{response}} \times 5,} and if we had a hundred, we'd get Tresponse×5{\T_{\text{response}} \times 5}

We wouldn't be wrong in characterizing this as an awfully inefficient method of transmitting data. But, we must keep context in mind. In the Internet's nursery years, it was astonishing that we could somehow get a digital image in Dallas, Texas to Ithaca, New York. At the time, the bar was to just get the transmission to occur — much like getting a parser to parse just one keyword.

Persistent HTTP

Within about three years of HTTP/1.0's release (yes, three — we aren't kidding when we say the Internet develops rapidly), people realized nonpersistent HTTP wasn't it. This led to HTTP/1.1 — the protocol governing persistent HTTP. There are two varities of persistent HTTP: (1) persistent HTTP with pipelining (we'll refer to this as pipelined HTTP for short), and (2) persistent HTTP without pipelining (we'll refer to this as Nonpipelined HTTP).

Persistent HTTP without Pipelining

Under nonpipelined HTTP, the client establishes a TCP connection just as we saw in HTTP/1.0, but now the TCP connection is kept open. The client will continue asking for just one data object at a time. This yields the following procedural flow:

  1. Client sends a request to establish a TCP connection.
  2. If the server consents, a TCP connection is established.
  3. Client sends a request for object A{A} (base HTML file).
  4. Server sends a response with the object A.{A.}
  5. Client encounters object a0{a_0} inside A.{A.}
  6. Client sends a request for object a0.{a_0.}
  7. Server responds with a0.{a_0.}
  8. Steps 5 to 7 repeat for each a1,a2,a3,,an,{a_1, a_2, a_3, \ldots, a_n,} but only if the response for the last sent request has been received.
  9. Once an{a_n} is received, the server closes the TCP connection.

One question we might have: Why can't the server just send the entire object A{A} all at once? Why send each comprised object individually? The answer is because of the HTTP protocol. The server never makes a decision about what objects to send. This is because that decision should rest with the application developer. Perhaps A{A} is massive, and loading it all once would trigger the user's operating system to kill the application. Or perhaps A{A} is a dynamic site that only loads content when the user scrolls. These are design options that would be much harder to implement if the server made the decisions on whether and when all or some parts of A{A} are sent.

Persistent HTTP with Pipelining

Pipelined HTTP is the default for HTTP/1.1. The procedural flow is essentially the same as nonpipelined HTTP, with one key difference: The client does not wait to recieve the response to the last request (in the last section, step 8 of the procedural flow). Instead, the client immediately sends requests as soon as it encounters objects.

Browser Workflow

Let's take a closer look at a typical browser workflow for loading a website. Let's say user C{C} wants to visit the website www.site.com. The user inputs a string value site.com and hits enter. Implicitly, the input is actually: http://www.site.com (or, more commonly today, https://www.site.com). This string value is called a URL (Uniform Resource Locator), and it consists of certain parts:

scheme hostname path
http:// www.site.com /index.html

Each of these parts communicates a particular piece of information: The scheme communicates the protocol used to get the data (in this case, the HTTP protcol). The hostname communicates where the data exists (a system named www.site.com). And the path communicates what data we're asking for from that system.

Receiving the string, the browser splits the string into the aforementioned parts. This can be as simple as using a string split method, or a more complicated parsing method. E.g., in pseudocode:

function request(string url) {
	assert(url.starts_with("http://"))

	let url        = url.substring("http://")
	let host, path = url.split_at("/")
	let path       = "/" + path
}

Above, our hypothetical browser uses a function called request that takes a string argument (the URL entered by the user). This function is called when the user hits enter.

Once the input string is parsed, the browser must execute a system call — a request to the operating system (OS) — to establish a TCP connection with the server named www.site.com. To do so, the browser must instantiate a socket. Sockets are generally provided by some socket library (e.g., Python's socket library). The code might look something like:

import socketLib

function request(string url) {
	let socket = socketLib.socket(
		family   = socketLib.AF_INET,
		type     = socketLib.SOCK_STREAM,
		protocol = socketLib.IPPROTO_TCP.
	)

	assert(url.starts_with("http://"))

	let url        = url.substring("http://")
	let host, path = url.split_at("/")
	let path       = "/" + path
}

Again, this is just pseudocode. How a browser creates its sockets entirely depends on its implementation language and the libraries it uses, if any. That said, many of the socket APIs will look similar to the pseudocode above.

All sockets have an address family, which indicates how a machine can be found. In the code above, we've indicated AF_INET, the address family for IPv4 addresses. Other address families include AF_INET6 (for IPv6 addreses) and AF_BLUETOOTH (for Bluetooth addresses).

Sockets also have a type. This indicates what kind of data format will be transmitted. In this case, we've indicated SOCK_STREAM, which corresponds to the TCP protocol data ("the data will be transmitted as packets"). There's also SOCK_DGRAM, corresponding to the UDP protocol ("the data will be transmitted as datagrams").

Finally, sockets have a transport protocol, as we know. This indicates the transport protocol for trasmitting the data. Above, we've indicated IPPROTO_TCP (another protocol options might be IPPROTO_UDP).

Once the socket is instantiated, it makes a call to the operating system:

import socketLib

function request(string url) {
	let socket = socketLib.socket(
		family   = socketLib.AF_INET,
		type     = socketLib.SOCK_STREAM,
		protocol = socketLib.IPPROTO_TCP.
	)

	assert(url.starts_with("http://"))

	let url        = url.substring("http://")
	let host, path = url.split_at("/")
	let path       = "/" + path

	socket.connect(host, 80)
}

Receiving the system call, the OS sends the host name to a DNS (Domain Name System) server. This is a web application elsewhere that converts the host name into a destination IP address (perhaps something like 159.89.245.44). The DNS looks up the host name in its directory, and sends the corresponding IP address back to the OS. We can actually see this at work with the dig command:

$ dig +short sublimis.com
159.89.245.44

Above, the sublimis.com reduces to the IP address 159.89.245.44.

Once the operating system receives the IP address, it must make a decision: Should I use to establish a wired or wireless connection with this IP address? The OS makes this decision by examining a routing table. Again, the details behind this routing table are complex and left to a later chapter, but in short, this table provides information that allows the operating system to determine the fastest way to get signals to the destination IP address.

Let's say the OS goes with wireless. The OS prepares a request to establish a TCP connection. Using its device drivers, the bits comprising this data are eventually transformed to signals and emitted over the air. Those signals are picked up by a nearby access point (e.g., a WiFi receiver), which transforms them back into data. That receiver sees the IP address, and sends forwards them to next system along the route, all the way to the server's IP address. As we saw earlier, once this message gets to the server, it must either agree or disagree to establishing a TCP connection.

Let's say it agrees. The server sends back its "Ok" response to the OS, and the OS sends its response to the browser: "You're now connected to 159.89.245.44". One program we can use to see this process at work is Telnet (installation may be required).

$ telnet sublimis.com
Trying 159.89.245.44...
Connected to sublimis.com.

HTTP Request Message

Above, we see that a TCP connection has been established with sublimis.com. Once a connection is established, the browser requests for the webpage from the server by specifying its path. Most websites have a default path (e.g., an index.html file) aliased the host name entered. This request is a data object called an HTTP request message. This object looks something like the following (note that there's a black lank after the last visible line):

GET /dir/index.html HTTP/1.1
Host: www.site.com
User-agent: Mozilla/10.4
Connection: open
Accept-Language:en-us, en
Accept-Encoding: gzip,deflate
Accept-Chaset: utf-8
Keep-alive: 300
Connection: keep-alive

The browser sends this request message through the socket it instantiated:

import socketLib

function request(string url) {
	let socket = socketLib.socket(
		family   = socketLib.AF_INET,
		type     = socketLib.SOCK_STREAM,
		protocol = socketLib.IPPROTO_TCP.
	)

	assert(url.starts_with("http://"))

	let url        = url.substring("http://")
	let host, path = url.split_at("/")
	let path       = "/" + path

	socket.connect(host, 80)

	socket.send("GET {} HTTP/1.0\r\n".format(path).encode("utf8") + 
				"Host: {}\r\n\r\n".format(host).encode("utf8"))
}

A few things to note about this illustration. Notice the use of the return character, \r. This is how we demarcate the start and end of a request. Next, notice the call to encode(). For this pseudocode library, this call converts the strings into raw bytes, using an encoding scheme called UTF-8. This is a standardized mapping of integers to leters. These raw bytes are received by the server process on the other end, which has its own program for decoding the bytes received.

At this point, it should be even more clear why we need protocols. Both the browser and the server have agreed to the format above. Thus, when it receives the bytes in the example, it knows to interpret specific bytes as communicating specific pieces of information. Without protocols, neither client nor server would have any idea what they're sending to one another.

HTTP Methods

HTTP request messages are requests for the server to do something. HTTP establishes what clients can ask the server to do. These are called HTTP Methods. In the last pseudocode example, we used the HTTP GET method.

Another method is called POST. Where GET is used to receive data from the server, the POST method is used to send data from the server. For example, inputting a credit card number on an eCommerce site, submitting our email to a subscription service, or publishing a Tweet. This message has the following format:

POST /dir/register.asp HTTP/1.1
Host: www.site.com
User-agent: Mozilla/10.0
Accept-text/xml,text/html,text/plain,image/jpeg
Accept-Language:en-us,en
Accept-Encoding:gzip,deflate 
Accept-Charset:utf-8,
Connection:close

Like the GET method, all of this data is written as a string value but encoded to raw bytes before it's sent to the server.

import socketLib

function post(string url) {
	let socket = socketLib.socket(
		family   = socketLib.AF_INET,
		type     = socketLib.SOCK_STREAM,
		protocol = socketLib.IPPROTO_TCP.
	)
	// user enters URL
	assert(url.starts_with("http://"))

	let url        = url.substring("http://")
	let host, path = url.split_at("/")
	let path       = "/" + path

	socket.connect(host, 80)

	socket.send("POST {} HTTP/1.1\r\n".format(path).encode("utf8") + 
				"Host: {}\r\n\r\n".format(host).encode("utf8"))
}

Other methods include PUT, which uploads a file to the server (e.g., uploading a file to some cloud storage) and DELETE, which deletes the file on the server, as specified in the URL field (e.g., deleting a file from cloud storage).

Reading the HTTP Response Message

Once the server receives the client's HTTP request, it forms its own response message, and sends it back to the client. To read this response, the browser uses its socket to create a file-like data object (we can think of it like a struct):

let response = socket.makefile("r", encoding="utf8", newline="\r\n");

This object contains all of the bytes sent from the server, as well as an API for parsing and interpreting the bytes. Groups of bytes map to certain pieces of information, and those groups are delineated with new lines.

The first line corresponds to an HTTP Status Code. HTTP provides numerous status codes, but only a subset is used extensively. Commonly used codes include:

Status CodeNameMeaning
200OKRequest succeeded, data object is later in the message
301Moved PermanentlyRequested data object is now at new location, location specified later in the message
400Bad RequestRequest message was not understood by the server
404Not FoundRequested data object not found on the server
505HTTP Version Not SupportedClient is using an HTTP protocol the server does not follow

Reading the response message, the browser first checks for these status codes, performing error handling as needed.

import socketLib

function request(string url) {
	let socket = socketLib.socket(
		family   = socketLib.AF_INET,
		type     = socketLib.SOCK_STREAM,
		protocol = socketLib.IPPROTO_TCP.
	)

	assert(url.starts_with("http://"))

	let url        = url.substring("http://")
	let host, path = url.split_at("/")
	let path       = "/" + path

	socket.connect(host, 80)

	socket.send("GET {} HTTP/1.0\r\n".format(path).encode("utf8") + 
				"Host: {}\r\n\r\n".format(host).encode("utf8"))

	let response = socket.makefile("r", encoding="utf8", newline="\r\n")
	let statusline = response.readline()
	let version, status, reason = statusline.split(" ", 2)
	if (status === "200") {
		let headers = new hash_table()
		while true:
		let line = response.readline()
		if line === "\r\n" {
			break
		}
		let header = line.split(":", 1)
		let value = line.split(":", 1)
		headers.insert(header, value)
	} else {
		switch(status):
			case "301":
				handle_301_error()
			case "400":
				handle_400_error()
			...
	}
}

Assuming all goes well, the browser stores the information received in some data structure of key-value pairs (where the key is a header and the value is the associated data), likely using a hash table variant.

There numerous HTTP responses headers, but there are two that are particularly important to browsers: transfer-encoding and content-encoding. The transfer-encoding header tells the browser what encoding the server used to transfer the requested data object. The browser needs this information because the bytes contained in the response's body (the field containing the requested data object) can be formatted in many different ways. It might be chopped into smaller pieces (transfer-encoding: chunked), compressed (transfer-encoding: compressed), gzipped (transfer-encoding: gzip), or deflated (transfer-encoding: deflate). Each of these options uses a particular algorithm for encoding and decoding the data object.

The content-encoding header takes the same values as transfer-encoding and communicates the same kind of information. The reason the browser needs both these headers is that servers are free to use either of the two. Most servers today use content-encoding, but because of the possibility of a server using transfer encoding, checking both of these headers is the only way to ensure the browser uses the correct decoding scheme.

Once the headers have been read, the browser proceeds to read the body:

// inside the request() function

let body = response.read()
socket.close()

The body variable above contains a long string of HTML code:

<title>site.com</title><meta charset="utf-8"> ...

With that, the rest is in another browser module's hands (namely, an HTML parser). This leads to the world of browser engineering, a field onto itself and far outside the scope of this chapter.

Cookies

Earlier, we mentioned that under the HTTP protocol, servers are essentially stateless. Having state, however, would be extremely useful: It allows user to pick up where they left off the last time they used the network application. While server-side state isn't feasible for extremely large applications, such applications can achieve a similiar beneft through cookies.

Generally, a cookie is a key-value pair, where the key is a host name, and the value is some integer. The cookies are kept in some container with fast read and write times — e.g., a hash table — colloquially called a cookie jar. We can imagine it as looking something like:

let COOKIE_JAR = {
	amazon: 1678,
	youtube: 8731,
	netflix: 3191
}

When the user sends an HTTP request, the server sends an HTTP response as usual. If the site uses cookies, however, the HTTP response will contain a special header called Set-Cookie. When the browser reads the response headers with its socket, it updates the cookie jar. For example, the browser might have some function that looks like:

function request() {
	...
	let url        = url.substring("http://")
	let host, path = url.split_at("/")
	let path       = "/" + path
	...

	let headers = new hash_table()
	...
	if headers.has("set-cookie") {
		let cookie = headers.get_value("set-cookie")
		COOKIE_JAR[host] = cookie
	}
}

The cookie jar is maintained, as long as the user doesn't clear it (e.g., when clearing Chrome's browser history, there's an option to clear "Cookies and other site data").

Using the example cookie jar above, let's say the site is Amazon. The user puts some items in the cart and leaves without purchasing. The next time the user visits Amazon, the browser will send an HTTP request with Cookie header:

"GET ... cookie: 1678 \n ... "

When Amazon receives this cookie, it sees the cookie field and looks up the value in its database. If it finds the value, it retrieves the user's associated data (e.g., what product number was last left in their cart). If Amazon doesn't see the cookie, it assumes that the client is an entirely new user and creates a new cookie for the user. Each time the server receives HTTP requests, it updates the data associated with the user.

Cookies make the web feel seamless and personalized. They allow developers to implement subtle and convenient features that we often take for granted — watching a YouTube video or Netflix movie right where we left off, keeping things in our shopping cart, getting suggestions in Google search, ensuring the webpage is rendered according to user preferences (e.g., executing optimizations for blind users or rendering the page according to the user's spoken language), and many others.

On the other hand, cookies also raise numerous privacy concerns. It's not always clear what data a network application tracks or shares with other applications. In 2011, the European Union enacted the General Data Protection Regulation (colloquially called the EU Cookie Law), which requires data controllers (entities that gather data from EU residents) to ask for consent before using cookies. Given how many end users there are in the EU, this has led to what's steadily becoming the Internet's bumpersticker: "We use cookies to improve your experience on our site and to show you relevant advertising."

Web Caches

As we know from computer architecture, a cache is a region of memory Rn{R_n} that serves as a larger storage area for a region Rn+1{R_{n+1}} closer to the processor. The closer we store data to the processor, the faster it is for the processor to process the data. This same idea extends to networking: If a particular data object is being requested so often, a typical modern server will store the data object in its cache. For example, if a server at MIT notices that some UIUC professor's lecture notes page is constantly being accessed, the MIT server will likely cache that page. This has several benefits: (1) It gives the users faster access time, and (2) it allows the server to focus on traffic that must go out beyond its LAN.

Footnotes

  1. Before we criticize users on such old machines or the servers that keep these users connected, it's important to recognize that backwards compatibility is one reason why there's so much available information on the Internet. Not everyone can afford the latest and fastest technologies, nor does everyone want to. Many web users just want to write and share information, others just want to consume. If a 16-inch Macbook Pro with 64GB RAM was a prerequisite to that privilege, the Internet would be a far different place. Of course, this kind of backwards compatibility makes web development and browser engineering somewhat of a wild west compared to other fields of software engineering — thousands of libraries that all seem like clone troopers in different stripes, and frameworks that spread and dissipate like California wildfires. The easiest way to spark (or kill) conversation with a web developer at a cocktail party: Ask them what they think about Internet Explorer.