In this article, I will present you the basics of HTTP.
But why HTTP?
Why should I read about HTTP you may ask yourself?
Well, if you are a software developer, you will understand how to write better applications by learning how they communicate. If you are system architect or network admin, you will get deeper knowledge on designing complicated network architectures.
REST, which is very important architectural style nowadays is relying completely on utilizing HTTP features, so that makes HTTP even more important to understand. If you want to make great RESTful applications, you must understand HTTP first.
I should not that REST doesn’t rely on HTTP only. It can be implemented using other protocols, but it seems that HTTP won that battle by a far margin, and you’ll hardly find REST implementations that use other protocols.
So are you willing to pass on the chance to understand and learn the fundamental concepts of the World Wide Web and network communication?
I hope not 🙂
The article will focus on the most important parts of HTTP and attempt to explain them as simply as possible. The idea is to organize all the useful information about HTTP in one place, to save you the time of going through books and RFCs to find the information you need.
This is the first article of the HTTP series. It gives a short introduction of the most important concepts of the HTTP.
- The HTTP series (Part 1): Overview of the basic concepts
- The HTTP series (Part 2): Architectural aspects
- The HTTP series (Part 3): Client identification
- The HTTP series (Part 4): Authentication mechanisms
- The HTTP series (Part 5): Security
- The HTTP Reference
You will learn about:
- What HTTP is exactly
- How messages are exchanged between the Web Client and the Web Server
- Messages and some message examples
- MIME types
- Request Methods
- Status codes
Without further ado, let’s dive in.
The founder of HTTP is Tim Berners-Lee (the guy also considered to be the inventor of the World Wide Web). Among other names important to the development of HTTP is also Roy Fielding, who is also the originator of REST architectural style.
The Hypertext Transfer Protocol is the protocol that applications use to communicate with each other. In essence, HTTP is in charge of delegating all of the internet’s media files between clients and servers. That includes HTML, images, text files, movies and everything in between. And it does this quickly and reliably.
HTTP is the application protocol and not the transport protocol because it is used for the communication in the application layer. To jog your memory here is what the Network Stack looks like.
From this image, you can clearly see the that HTTP is the application protocol and that TCP works on the transport layer.
Everything on the internet is a resource, and HTTP works with resources. That includes files, streams, services and everything else. An HTML page is a resource, a youtube video is a resource, your spreadsheet of daily tasks on a web application is a resource… You get the point.
And how do you differentiate one resource from another?
By giving them URLs (Uniform resource locators).
A URL points to the unique location where the resource can be found.
Every piece of content, every resource lives on some Web server (HTTP server). These servers are expecting HTTP requests for those resources.
But how do you request a resource from a Web server?
You need an HTTP client of course 🙂
You are using an HTTP client right now to read this article. Web browsers are HTTP clients. They communicate with HTTP servers to fetch the resources to your computer. Some of the most popular clients are Google’s Chrome, Mozilla’s Firefox, Opera, Apple’s Safari, and unfortunately still the infamous Internet Explorer.
So what does HTTP message look like?
Without talking too much about it, here are some examples of HTTP messages:
GET /repos/CodeMazeBlog/ConsumeRestfulApisExamples HTTP/1.1
Authorization: Basic dGhhbmtzIEhhcmFsZCBSb21iYXV0LCBtdWNoIGFwcHJlY2lhdGVk
POST /repos/CodeMazeBlog/ConsumeRestfulApisExamples/hooks?access_token=5643f4128a9cf974517346b2158d04c8aa7ad45f HTTP/1.1
Here is an example of one GET and one POST request. Let’s go quickly through the different parts of these requests.
The first line of the request is reserved for the request line. It consists of the request method name, the request URI, and the HTTP version.
The next few lines represent the request headers. Request headers provide additional info to the requests, like the content types the request expects in response, authorization information etc,
For the GET request, the story ends right there. A POST request can also have a body and carry additional info in the form of a body message. In this case, it is a JSON message with additional info on how the GitHub webhook should be created for the given repo specified in the URI. That message is required for the webhook creation so we are using a POST request to provide that information to the GitHub API.
The Request line and request headers must be followed by <CR><LF> (carriage return and line feed \r\n), and there is a single empty line between the message headers and the message body that contains only CRLF.
Reference for HTTP request: https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html
And what do we get as a response to these requests?
HTTP/1.1 200 OK
Date: Sun, 18 Jun 2017 13:10:41 GMT
Content-Type: application/json; charset=utf-8
Status: 200 OK
Cache-Control: private, max-age=60, s-maxage=60
"message": "Invalid HTTP Response: 404"
The response message is pretty much structured the same as the request, except the first line, called the status line, which surprising as it is, carries information about the response status.
The status line is followed by the response headers and response body.
Reference for HTTP response: https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html
MIME types are used as a standardized way to describe the file types on the internet. Your browser has a list of MIME types and same goes for web servers. That way files can be transferred the same way regardless of the operating system.
Fun fact is that MIME stands for the Multipurpose Internet Mail Extension because they were originally developed for the multimedia email. They were adapted to be used for HTTP and several other protocols since.
Every MIME type consists of a type, subtype and a list of optional parameters in the following format: type/subtype; optional parameters.
Here are a few examples:
Content-Type: text/xml; charset=utf-8
You can find the list of commonly used MIME types and subtypes in the HTTP reference.
HTTP request methods (referred to also as “verbs”) define the action that will be performed on the resource. HTTP defines several request methods. The most commonly known/used are GET and POST methods.
A request method can be idempotent or not idempotent. This is just a fancy term for explaining that method is safe/unsafe to be called several times on the same resources. In other words, that means that GET method, that has a sole purpose of retrieving information, should by default be idempotent. Calling GET on the same resource over and over should not result in a different response. On the other hand, the POST method is not an idempotent method.
Prior to HTTP/1.1, there were just three methods: GET, POST and HEAD, and the specification of HTTP/1.1 brought a few more methods into the play: OPTIONS, PUT, DELETE, TRACE and CONNECT.
Find more how each one of these methods works in the HTTP Reference.
Header fields are colon-separated name-value fields you can find just after the first line of a request or response message. They provide more context to the HTTP messages and ensure clients and servers are appropriately informed about the nature of the request or response.
There are five types of headers:
- General headers: These headers are useful to both the server and the client. One good example is the Date header field which provides the information about the time of the message creation.
- Request headers: Specific to the request messages. They provide the server with additional information. For example, Accept: */* header field informs the server that the client is willing to receive any media type.
- Response headers: Specific to the response messages. They provide the client with the additional information. For example, Allow: GET, HEAD, PUT header field informs the client which methods are allowed for the requested resource.
- Entity headers: These headers deal with the entity body. For example, Content-Type: text/html header lets the application know that the data is HTML document.
- Extension headers: These are nonstandard headers constructed by application developers. They are not the part of HTTP but need to be tolerated.
You can find the list of commonly used request and response headers in the HTTP Reference.
The status code is a three digit number that denotes the result of a request. It is followed by the reason phrase which is humanly readable status code explanation.
Some examples include:
- 200 OK
- 404 Not Found
- 500 Internal Server Error
The status codes are classified by the range in five different groups.
Both the status code classification and the entire list of status codes and their meaning can be found in the HTTP Reference.
Phew, that was a lot of information.
The knowledge you gain by learning HTTP is not the kind that helps you to solve some problem directly. But it gives you the understanding the underlying principle of the internet communication which you can apply to almost every other problem on the higher level than HTTP. Whether it is REST, APIs, web application development or network, you can now be at least a bit more confident while solving these kinds of problems.
Of course, HTTP is a pretty large topic to talk about and there is still a lot more to it than the basic concepts.
Was this article helpful to you? Please leave the comment and let me know.