In this article, we will go through the basic concepts of HTTP.

But why HTTP?

Why should I read about HTTP you may ask yourself?

Well, if you are a software developer, you will understand how to write better applications by learning how they communicate. If you are a system architect or network admin, you will get deeper knowledge of designing complicated network architectures.

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!
Become a patron at Patreon!

REST, which is a very important architectural style nowadays is relying completely on utilizing HTTP features, which makes HTTP even more important to understand. If you want to make great RESTful applications, you must understand HTTP first.

I should note that REST doesn’t rely on HTTP only. It can be implemented using other protocols, but it seems that HTTP won that battle by a fair margin, and you’ll hardly find the REST implementation using other protocols.

So are you willing to pass on the chance to understand and learn the fundamental concepts of the World Wide Web and network communication?

I hope not 🙂

The article will focus on the most important parts of HTTP and attempt to explain them as simply as possible. The idea is to organize all the useful information about HTTP in one place, to save you the time of going through books and RFCs to find the information you need.

This is the first article of the HTTP series. It gives a short introduction to the basic concepts of HTTP.

You will learn about:

Without further ado, let’s dive in.

HTTP Definition

The founder of HTTP is Tim Berners-Lee (the guy also considered to be the inventor of the World Wide Web). Among other names important to the development of HTTP is also Roy Fielding, who is also the originator of the REST architectural style.

The Hypertext Transfer Protocol is the protocol that applications use to communicate with each other. In essence, HTTP is in charge of delegating all of the internet’s media files between clients and servers. That includes HTML, images, text files, movies, and everything in between. And it does this quickly and reliably.

HTTP is the application protocol and not the transport protocol because we are using it for communication in the application layer. To jog your memory here is what the Network Stack looks like.

Network stack

From this image, you can clearly see that HTTP is the application protocol and that TCP works on the transport layer.

Resources

resource

Everything on the internet is a resource, and HTTP works with resources. That includes files, streams, services, and everything else. An HTML page is a resource, a youtube video is a resource, your spreadsheet of daily tasks on a web application is a resource… You get the point.

And how do you differentiate one resource from another?

By giving them URLs (Uniform resource locators).

A URL points to the unique location where the resource is located.

How To Exchange Messages Between a Web Client and a Web Server

Every piece of content, and every resource lives on some Web server (HTTP server). These servers are expecting requests for those resources.

But how do you request a resource from a Web server?

You need a client of course 🙂

You are using an HTTP client right now to read this article. Web browsers are HTTP clients. They communicate with HTTP servers to fetch the resources to your computer. Some of the most popular clients are Google’s Chrome, Mozilla’s Firefox, Opera, Apple’s Safari, and unfortunately still the infamous Internet Explorer.

Some Message Examples

So what does an HTTP message look like?

Without talking too much about it, here are some examples of HTTP messages:

GET request

GET /repos/CodeMazeBlog/ConsumeRestfulApisExamples HTTP/1.1
Host: api.github.com
Content-Type: application/json
Authorization: Basic dGhhbmtzIEhhcmFsZCBSb21iYXV0LCBtdWNoIGFwcHJlY2lhdGVk
Cache-Control: no-cache

POST request

POST /repos/CodeMazeBlog/ConsumeRestfulApisExamples/hooks?access_token=5643f4128a9cf974517346b2158d04c8aa7ad45f HTTP/1.1
Host: api.github.com
Content-Type: application/json
Cache-Control: no-cache

{
  "url": "http://www.example.com/example",
  "events": [
    "push"
  ],
  "name": "web",
  "active": true,
  "config": {
    "url": "http://www.example.com/example",
    "content_type": "json"
  }
}

Here is an example of one GET and one POST request. Let’s go quickly through the different parts of these requests.

The first line of the request is reserved for the request line. It consists of the request method name, the request URI, and the HTTP version.

The next few lines represent the request headers. Request headers provide additional info to the requests, like the content types the request expects in response, authorization information, etc,

For a GET request, the story ends right there. A POST request can also have a body and carry additional info in the form of a body message. In this case, it is a JSON message with additional info on how to create the GitHub webhook for the given repo specified in the URI. That message is required for the webhook creation so we are using a POST request to provide that information to the GitHub API.

The Request line and request headers must be followed by <CR><LF> (carriage return and line feed \r\n), and there is a single empty line between the message headers and the message body that contains only CRLF.

Reference for an HTTP request: https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html

And what do we get as a response to these requests?

Response message

HTTP/1.1 200 OK
Server: GitHub.com
Date: Sun, 18 Jun 2017 13:10:41 GMT
Content-Type: application/json; charset=utf-8
Transfer-Encoding: chunked
Status: 200 OK
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4996
X-RateLimit-Reset: 1497792723
Cache-Control: private, max-age=60, s-maxage=60

[
  {
    "type": "Repository",
    "id": 14437404,
    "name": "web",
    "active": true,
    "events": [
      "push"
    ],
    "config": {
      "content_type": "json",
      "insecure_ssl": "0",
      "url": "http://www.example.com/example"
    },
    "updated_at": "2017-06-18T12:17:15Z",
    "created_at": "2017-06-18T12:03:15Z",
    "url": "https://api.github.com/repos/CodeMazeBlog/ConsumeRestfulApisExamples/hooks/14437404",
    "test_url": "https://api.github.com/repos/CodeMazeBlog/ConsumeRestfulApisExamples/hooks/14437404/test",
    "ping_url": "https://api.github.com/repos/CodeMazeBlog/ConsumeRestfulApisExamples/hooks/14437404/pings",
    "last_response": {
      "code": 422,
      "status": "misconfigured",
      "message": "Invalid HTTP Response: 404"
    }
  },
]

The response message is pretty much structured the same as the request, except the first line, called the status line, which surprising as it is, carries information about the response status.

Response headers and response body come right after the status line.

Reference for HTTP response: https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html

MIME Types

MIME types represent a standardized way to describe the file types on the internet. Your browser has a list of MIME types and the same goes for web servers. That way we can transfer files in the fashion regardless of the operating system.

A fun fact is that MIME stands for the Multipurpose Internet Mail Extension because they were originally developed for multimedia email. They were adapted to be used for HTTP and several other protocols since.

Every MIME type consists of a type, subtype, and a list of optional parameters in the following format: type/subtype; optional parameters.

Here are a few examples:

Content-Type: application/json
Content-Type: text/xml; charset=utf-8
Accept: image/gif

You can find the list of commonly used MIME types and subtypes in the HTTP reference.

Request Methods

HTTP request methods (referred to also as “verbs”) define the action that will be performed on the resource. HTTP defines several request methods. The most commonly known/used are GET and POST methods.

A request method can be idempotent or not idempotent. This is just a fancy term for explaining that the method is safe/unsafe to be called several times on the same resources. In other words, that means that GET a method, that has the sole purpose of retrieving information, should by default be idempotent. Calling GET on the same resource over and over should not result in a different response. On the other hand, the POST method is not an idempotent method.

Prior to HTTP/1.1, there were just three methods: GET, POST, and HEAD, and the specification of HTTP/1.1 brought a few more methods into the play: OPTIONS, PUT, DELETE, TRACEand CONNECT.

Find more about how each one of these methods works in the HTTP Reference.

Headers

Header fields are colon-separated name-value fields you can find just after the first line of a request or response message. They provide more context to the messages and inform clients and servers about the nature of the request or response.

There are five types of headers:

  • General headers: These headers are useful to both the server and the client. One good example is the Date header field which provides information about the time of the message creation.
  • Request headers: Specific to the request messages. They provide the server with additional information. For example, the Accept: */* header field informs the server that the client is willing to receive any media type.
  • Response headers: Specific to the response messages. They provide the client with additional information. For example, the Allow: GET, HEAD, PUT header field informs the client which methods are allowed for the requested resource.
  • Entity headers: These headers deal with the entity-body. For example, the Content-Type: text/html header lets the application know that the data is an HTML document.
  • Extension headers: These are nonstandard headers application developers can construct. Although they are not part of HTTP, it tolerates them.

You can find the list of commonly used request and response headers in the HTTP Reference.

Status Codes

404batman

A status code is a three-digit number that denotes the result of a request. The reason phrase which is a humanly readable status code explanation comes right after.

Some examples include:

  • 200 OK
  • 404 Not Found
  • 500 Internal Server Error

The status codes are classified by the range in five different groups.

Both the status code classification and the entire list of status codes and their meaning can be found in the HTTP Reference.

Conclusion

Phew, that was a lot of information.

The knowledge you gain by learning HTTP basic concepts is not the kind that helps you to solve some problems directly. But it gives you an understanding of the underlying principle of internet communication which you can apply to almost every other problem on a higher level than HTTP. Whether it is REST, APIs, web application development, or network, you can now be at least a bit more confident while solving these kinds of problems.

Of course, HTTP is a pretty large topic to talk about and there is still a lot more to it than the basic concepts.

Read about the architectural aspects of HTTP in part 2 of the series.

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!
Become a patron at Patreon!