Building Cloud Services: HTTP

Link to project here.

This series of blogs function as my notes as I follow along with the Building Cloud Services with the Java Spring Framework course on Coursera.

HTTP is a protocol which allows the fetching of resources, such as HTML documents. It is the foundation of any data exchange on the Web and it is a client-server protocol, which means requests are initiated by the recipient, usually the Web browser. Clients and servers communicate by exchanging individual messages (as opposed to a stream of data). The messages sent by the client, usually a Web browser, are called requests and the messages sent by the server as an answer are called responses.

HTTP Request
- Uniform Resource Locator (URL) and Query Parameters
- Request Body Encoding
  - MIME Types
HTTP Response
- Cookies
Automatically Updating Clients
REST Interfaces

HTTP Request

Requests Anatomy:

Request Line (Start Line)
- Request methods such as:
  - GET - Retrieve resource from server.
  - POST - Store data on server.
  - PUT - Store data on server. The difference between PUT and POST is that PUT is idempotent: calling it once or several times successively has the same effect (that is no side effect), where successive identical POST may have additional effects, like passing an order several times.
  - DELETE - Delete resource on server.
- Resource Path - such as "index.html".
Headers - Extra info to help servers figure out the right way to process request
- Language
- Character Set
- Content Type
- Cookies
Body (Optional)

Headers are meta information about the request, whereas the Body is the core data that is being sent to the server to process the request. If you didn't include the Headers, but you did include the Body, the server could still process the Request. It just may not give it back to you in the format that you expected or exactly the way that you expected it. But if you didn't include the Body and the server needed the Body, it wouldn't be able to process the Request. So the Body is the data the client is sending, that the server absolutely has to have in order to complete that Request.

Uniform Resource Locator (URL) and Query Parameters

A URL takes the form: http://host:port/path?a=1&b=2

Where,

Host - The server that we are communicating with
Port - The specific port on the host to connect to
Path - The location of (or path to) the resource on the server
"?" - Denotes the start of the query parameters section
"a=1" - A query parameter that means "the key 'a' has value 1"
"&" - Parameter separator to include more than one query parameter up to a limit

Because the special characters "?", "=", "&" and others are used by the server uses to parse these URLs, including these characters in any other part of the path can lead to confusion. When these characters are needed as just plain characters in the URL rather than keywords, they are simply replaced with their URL encoded equivalents.

Request Body Encoding

The two most typical types of body encoding are:

URL Encoded - See Query Parameters, used for small amounts of data like simple key-value pairs.
Multipart - Message with multiple parts, each can have its own identifier (key) and its own MIME type. Typically used for sending large amounts of data to a server.

MIME Types

Imagine an image download service on a HTTP-based server, depending on which image you access, they could have different formats such as JPEG and PNG. A server could also send back plain text that the client is meant to display plainly, or it could send HTML where the client is expected to render the elements to form a user interface. In all cases, the client and server needs a way to tell each other how to interpret the data.

While it is good practice to have resource identifier (URL) encode the correct file extension, it doesn't have to. This is because there may be certain resources on the server that you may want to have different formats for, and may want them to work interchangeably without coupling the URL to the exact format we want to return the data in.

The way that this is done in HTTP is through what's called MIME types, they are identifiers for particular types or formats of information. Based on the cases above, they could look like the following:

image/jpg
image/png
text/plain
text/html

These types describe the format of the body, which the server or client looks at these to decide how to interpret the data. They can be found in the content-type header of a HTTP message in both requests and responses.

HTTP Response

HTTP responses (similar to requests) share similar structure and are composed of:

Status Line (Start Line) - A way for the server communicates if something interesting happened while processing our request, such as if the file has been moved or if we are not allowed to access a resource.
- Response Code - It is important to understand how to interpret HTTP response codes so that we can write logic into our clients to respond appropriately to what happens on the server. These codes are typically in the 100 to 500 range.
  - 1xx informational response – the request was received, continuing process
  - 2xx successful – the request was successfully received, understood, and accepted
  - 3xx redirection – further action needs to be taken in order to complete the request
  - 4xx client error – the request contains bad syntax or cannot be fulfilled
  - 5xx server error – the server failed to fulfil an apparently valid request
- Response Phrase or Text - Textual description of the reason for the response code
- For example:
  - 102 Processing
  - 301 Moved Permanently
  - 200 OK
  - 404 Not Found
  - 500 Internal Server Error
Header
Body

Cookies

The client is generally the driver and instigator of the communication and what messages it sends to the server. However, it is occasionally helpful for the server to provide a hint to the client about data that it would like the client to provide in future requests. For example, when logging in to Amazon or a banking website, these websites will provide you with some data indicating that you have logged in, and that can be used to prove to that server that you are the same client that has been sending a series of requests. This is done in HTTP through a mechanism called cookies.

Cookies are very pieces of data that the server sends back to the client and asks the client to remember on its behalf and then send in future request. These are typically delivered to the client via a cookie header in the response. On future requests, a client will have to place the cookie into their response header to automatically identify themselves to the server. Cookies can also have an expiration time attached to them, such as to limit the amount of time a user can stay logged in while inactive. We can also specify that a cookie is only to be sent if there is a secure link via the HTTPS protocol where the contents of the message are encrypted.

Automatically Updating Clients

One of the issues with HTTP clients driving communication with the server is that they also decide the timing when it checks for updates with the server. If there is an important update on the server that the client needs to know about, there is no easy way for the server to inform the client exactly when it needs to.

There are a few ways to design around this limitation:

Event Driven Models. This method is useful because users can make intelligent decisions on when they want to update and aren't going to continually manually refresh over an extended period of time, conserving server resources. The user of the client can decide when to fetch the information, such as refreshing a webpage or the pulldown action that is typically found in mobile apps.
Polling. Have the client continuously ask the server if there are any new updates at set time intervals. To save on resources, this approach should adapt to the responses the client receives from the server. A typical method is to:
- Client requests updates at interval T.
- After x number of responses from the server with no new information, the client starts making requests at interval 2T.
- After x number of responses from the server with no new information, the client starts making requests at interval 4T.
- ... (and so on) ...
- Repeat until a given ceiling for the interval is reached such has once every 24 hours.
- If an update is received, set interval back to T.
WebSockets. This basically creates a direct link between the client and the server. It allows us to omit the usual overhead of the HTTP protocol with a custom, more efficient one. This also allows the server to send updates to the client whenever it needs to.

However, this comes with it some baggage such as the requirement of a constant network connection. If the connection were to drop, logic also needs to be put in place to handle the errors that would occur as a result of the disconnection as well as handling reconnecting to the server. While web sockets do allow more efficient protocols to be used, the actual load on the server could adversely impact the number of clients that can access the service simultaneously.
Push Messaging/Notifications. One of the most common approaches of getting information or updates from server to client. For example, every Android client automatically sets up a persistent connection to the Google Cloud Messaging (GCM) servers using XMPP, an XML based messaging protocol that was originally used for chat messaging:
1. An Android client makes a HTTP request to register for a connection ID.
2. GCM responds with an ID for the client.
3. GCM and client establish an XMPP connection.
4. The client forwards its ID to your App server.
5. When an update is available, your server sends the client ID and a message to GCM.
6. GCM sends the client the message, which let's assume triggers a HTTP polling.
7. Your server sends the data directly to the client.

REST Interfaces

Representational state transfer (REST) is a software architectural style that defines a set of constraints to be used for creating Web services. It was introduced and defined in 2000 by Roy Fielding in his doctoral dissertation with a very specific set of requirements

In practice, it points to a service with a generic URL addressing scheme where interactions get more specific as we get deeper into the hierarchy, For example, in a video hosting service:

/video
- GET: A list of all videos
- PUT: Create or replace a video collection
- POST: Create a video collection
/video/1
- GET: A specific video
- PUT: Add or replace a video in the collection
- POST: Add a video to the collection
/video/1/duration
- GET: A specific part of the video
- PUT: Add or replace a section of the video
- POST: Add a section to the video