The Internet
How does the internet work? A simplistic summary for web developers.
It is important for web developers to understand in-depth, how the internet works. This comes in handy in the long run when the complexity of projects keep on increasing.
In the simplest terms, the Internet is a global network comprised of smaller networks that are interconnected using standardized communication protocols. When a message is sent from Computer 1 to Computer 2 via the internet, the message passes through the protocols stack and in the process gets converted/modified according to specific protocols and broken down into packets of information. The protocols stack consists of Application Layer (converts the message to an http/smtp/ftp request), TCP Layer (adds a TCP header which specifies the port to use so that the message can be delivered to the correct application), IP Layer (adds an IP header which specifies the correct address of the destination and the source computers).
After the protocols stack, the message gets converted to machine codes and finally electronic signals and sent to the nearest router. If a router finds the destination IP address in its subnetwork then the message is directed to that subnetwork, otherwise the router directs the message to the nearest router which is "above" it. Once the message reaches the destination computer it goes through the protocols stack again but in the reverse order and getts converted from machine code to packets of info to the original message.
This is a good starting point to start learning about The Internet.
The following is more-or-less the summary of this slightly technical whitepaper from Stanford and some MDN docs on this topic.
The Internet vs The Web
The Internet is a global network of networks while the Web, also referred formally as World Wide Web (www) is collection of information which is accessed via the Internet. Another way to look at this difference is; the Internet is infrastructure while the Web is service on top of that infrastructure. Alternatively, the Internet can be viewed as a big book-store while the Web can be viewed as collection of books on that store.
Internet Addresses
Because the Internet is a global network of computers, each computer connected to the Internet must have a unique address in order to communicate with each other. Internet addresses are in the form nnn.nnn.nnn.nnn
where nnn must be a number from 0 - 255. This address is known as an IPv4 address, more specifically an IPv4 Address. As the internet far exceeded the imagined scale of the founding fathers, the IPv4 is not enough to support all computers that are / are going to be connected to the internet as there are only around 4 billion possible IPv4 addresses. To solve this, the IPv6 has been introduced which is 128bit as opposed to the 32bit IPv4 and can support a very large number of addresses (10^28 or 10 Trillion Quadrillion).
The ping
command shows the ip address along with some more information of a specific computer connected to the internet. To check it out, fire up your terminal/command prompt and enter ping www.google.com
.
Protocol Stacks & Packets
In order to send a message from one computer to another via the internet the message must be translated from alphabetic text into electronic signals, transmitted over the Internet, then translated back into alphabetic text. This is accomplished by using a protocol stack. The protocol stack used on the Internet is refered to as the TCP/IP protocol stack because of the two major communication protocols used. The TCP/IP stack consists of the following:
- Applications Layer: Protocols specific to applications such as WWW, e-mail, FTP, etc.
- Transmission Control Protocol Layer (TCP): TCP directs packets to a specific application on a computer using a port number.
- Internet Protocol Layer (IP): IP directs packets to a specific computer using an IP address.
- Hardware layer: Converts binary packet data to network signals and back. So the path of a message sent from computer 1 to computer 2 will look something like this:
Computer 1: message -> Application layer -> TCP Layer -> IP Layer -> Hardware Layer -> Internet Internet -> Hardware Layer -> IP Layer -> TCP Layer -> Application Layer -> message : Computer 2
If the message to be sent is long, each stack layer that the message passes through may break the message up into smaller chunks of data. On the Internet, these chunks of data are known as packets.
Networking Infrastructure
After your packets traverse the phone network and your ISP's local equipment, they are routed onto the ISP's backbone or a backbone the ISP buys bandwidth from. From here the packets will usually journey through several routers and over several backbones, dedicated lines, and other networks until they find their destination.
The traceroute
command shows the path your packets are taking to a given Internet destination. To check it out, fire up your terminal/command prompt and enter traceroute www.google.com
.
Internet Infrasturcture
The Internet backbone is made up of many large networks which interconnect with each other. These large networks are known as Network Service Providers or NSPs. These networks peer with each other to exchange packet traffic. Each NSP is required to connect to three Network Access Points or NAPs. At the NAPs, packet traffic may jump from one NSP's backbone to another NSP's backbone. NSPs also interconnect at Metropolitan Area Exchanges or MAEs. MAEs serve the same purpose as the NAPs but are privately owned. NAPs were the original Internet interconnect points. Both NAPs and MAEs are referred to as Internet Exchange Points or IXs. NSPs also sell bandwidth to smaller networks, such as ISPs and smaller bandwidth providers.
The Internet Routing Hierarchy
Routers are packet switches. A router is usually connected between networks to route packets between them. Each router knows about it's sub-networks and which IP addresses they use. The router usually doesn't know what IP addresses are 'above' it. When a packet arrives at a router, the router examines the IP address put there by the IP protocol layer on the originating computer. The router checks it's routing table. If the network containing the IP address is found, the packet is sent to that network. If the network containing the IP address is not found, then the router sends the packet on a default route, usually up the backbone hierarchy to the next router.
Domain Names and Address Resolution
The Domain Name Service (DNS) is a distributed database which keeps track of computers' names and their corresponding IP addresses on the Internet. Many computers connected to the Internet host part of the DNS database and the software that allows others to access it. These computers are known as DNS servers. No DNS server contains the entire database; they only contain a subset of it. If a DNS server does not contain the domain name requested by another computer, the DNS server re-directs the requesting computer to another DNS server. The Domain Name Service is structured as a hierarchy similar to the IP routing hierarchy. The computer requesting a name resolution will be re-directed 'up' the hierarchy until a DNS server is found that can resolve the domain name in the request.
When an Internet connection is setup, one primary and one or more secondary DNS servers are usually specified as part of the installation. This way, any Internet applications that need domain name resolution will be able to function correctly. For example, when you enter a web address into your web browser, the browser first connects to your primary DNS server. After obtaining the IP address for the domain name you entered, the browser then connects to the target computer and requests the web page you wanted.
Internet Protocols
HTTP (Applications Layer)
One of the most commonly used services on the Internet is the World Wide Web (WWW). The application protocol that makes the web work is Hypertext Transfer Protocol or HTTP. It is a connectionless text based protocol. Clients (web browsers) send requests to web servers for web elements such as web pages and images. After the request is serviced by a server, the connection between client and server across the Internet is disconnected. A new connection must be made for each request.
When you type a URL into a web browser, this is what happens:
- If the URL contains a domain name, the browser first connects to a domain name server and retrieves the corresponding IP address for the web server.
- The web browser connects to the web server and sends an HTTP request (via the protocol stack) for the desired web page.
- The web server receives the request and checks for the desired page. If the page exists, the web server sends it. If the server cannot find the requested page, it will send an HTTP 404 error message. (404 means 'Page Not Found' as anyone who has surfed the web probably knows.)
- The web browser receives the page back and the connection is closed.
- The browser then parses through the page and looks for other page elements it needs to complete the web page. These usually include images, applets, etc.
- For each element needed, the browser makes additional connections and HTTP requests to the server for each element.
- When the browser has finished loading all images, applets, etc. the page will be completely loaded in the browser window.
More about http here.
SMTP (Applications Layer)
Another commonly used Internet service is electronic mail. E-mail uses an application level protocol called Simple Mail Transfer Protocol or SMTP. SMTP is also a text based protocol, but unlike HTTP, SMTP is connection oriented.
When you open your mail client to read your e-mail, this is what typically happens:
- The mail client (GMail, Microsoft Outlook, etc.) opens a connection to it's default mail server. The mail server's IP address or domain name is typically setup when the mail client is installed.
- The mail server will always transmit the first message to identify itself.
- The client will send an SMTP HELO command to which the server will respond with a 250 OK message.
- Depending on whether the client is checking mail, sending mail, etc. the appropriate SMTP commands will be sent to the server, which will respond accordingly.
- This request/response transaction will continue until the client sends an SMTP QUIT command. The server will then say goodbye and the connection will be closed.
TCP - Transmission Control Protocol
Under the application layer in the protocol stack is the TCP layer. When applications open a connection to another computer on the Internet, the messages they send (using a specific application layer protocol) get passed down the stack to the TCP layer. TCP is responsible for routing application protocols to the correct application on the destination computer.
TCP works like this:
- When the TCP layer receives the application layer protocol data from above, it segments it into manageable 'chunks' and then adds a TCP header with specific TCP information to each 'chunk'. The information contained in the TCP header includes the port number of the application the data needs to be sent to.
- When the TCP layer receives a packet from the IP layer below it, the TCP layer strips the TCP header data from the packet, does some data reconstruction if necessary, and then sends the data to the correct application using the port number taken from the TCP header.
IP - Internet Protocol
Unlike TCP, IP is an unreliable, connectionless protocol. IP doesn't care whether a packet gets to it's destination or not. Nor does IP know about connections and port numbers. IP's job is to send and route packets to other computers. IP packets are independent entities and may arrive out of order or not at all. It is TCP's job to make sure packets arrive and are in the correct order. About the only thing IP has in common with TCP is the way it receives data and adds it's own IP header information to the TCP data.
The Complete Packet
So, to summarize - the application layer data is segmented in the TCP layer, the TCP header is added, the packet continues to the IP layer, the IP header is added, and then the packet is transmitted across the Internet. The complete packet consists of TCP Header, IP Header and Data from Application Layer.