Whether you’re a programmer or someone who’s in constant contact with technology, it’s always a good thing to know how things work behind the curtains, let’s do an example with ‘google.com’.
The first thing that happens is that your browser gets the web’s I.P address from the DNS, but what is the DNS to begin with? The DNS (Domain Name System) is a database that holds the URL and IP address of a web page, for example Google’s URL is “google.com” and his IP address is “220.127.116.11”, imagine it like a phone guide which is a list of names with his corresponding phone number. The purpose of this is to reach human friendly navigation, just imagine you having to remember every single IP address of any web page that you need, it’s easier to remember the name of the web page.
In order to reach the DNS the browser needs to check four caches:
- Your browser checks your cache, your browser keeps a repository of DNS address for websites you already visited for a period of time.
- The browser checks your OS cache, in case it’s not stored in the browser cache.
- If that also fails, the browser sends a request to the DNS cache of your router which also contains its own cached DNS directions.
- If all of the above fails, the browser goes to the ISP (Internet Service Provider) which also contains its own cached DNS directions, this is the browser’s last hope into reaching your destination.
If all of these cache searches fail, the ISP’s DNS server starts a DNS query to find the address of the web site, in this case; ‘google.com’.
The whole purpose of the DNS query is to find an IP address that has never been in my computer before, it searches multiple IP address until it founds the correct one, this kind of search is called a recursive search since the search will repeatedly continue from a DNS server to a DNS server until it either finds the IP address we need or returns an error response saying it was unable to find it.
In this situation, we would call the ISP’s DNS server a DNS whose responsibility is to find the proper IP address of the intended domain name by asking other DNS servers on the internet for an answer.
Once the browser receives the correct IP address, it will build a connection with the server that matches the IP address to transfer information. Browsers use internet protocols to build such connections. There are several different internet protocols that can be used, but TCP is the most common protocol used for many types of HTTP requests.
To transfer data packets between your computer (Client) and the server, it is important to have a TCP connection established. This connection is established using a process called the TCP/IP three-way handshake. This is a three-step process where the client and the server exchange SYN (Synchronize) and ACK (Acknowledge) messages to establish a connection.
1. The client machine sends a SYN packet to the server over the internet.
2. If the server has open ports that can accept and initiate new connections, it’ll respond with an acknowledgment of the SYN packet.
3. The client will receive the SYN/ACK packet from the server.
Once the TCP connection is established, it is time to start transferring data. Your browser will send a GET request asking for ‘google.com’ web page. If you’re entering credentials or submitting a form, this could be a POST request. This request will also contain additional information such as browser identification (User-Agent header), types of requests that it will accept (Accept header), and connection headers asking it to keep the TCP connection alive for additional requests.
The server contains a webserver (i.e., Apache, IIS) that receives the request from the browser and passes it to a request handler to read and generate a response. The request handler is a program that reads the request to check what is being requested and also update the information on the server if needed. Then it will assemble a response in a particular format (JSON, XML, HTML).
The server response contains the web page you requested as well as the status code, compression type, how to cache the page , any cookies to set, privacy information, etc.
If you closely at the first line, it shows a status code. This is quite important as it tells us the status of the response. There are five types of statuses detailed using a numerical code.
1. 1xx indicates an informational message only
2. 2xx indicates success of some kind
3. 3xx redirects the client to another URL
4. 4xx indicates an error on the client’s part
5. 5xx indicates an error on the server’s part
(So, if you ever encounter an error of some kind, you can take a look at the HTTP response to check what type of status code you have received)
And that’s it! Enjoy your searchs!