A remote procedure call (RPC) is used to communicate between two processes running on different nodes (i.e., different computers in a distributed system). It uses the procedure call interface (much like regular function calls or syscalls), where a caller suspends execution until the RPC has returned.

The core idea is that RPCs are supposed to simplify the client-server communication by providing a layer of abstraction above regular socket programming that hides how the transport protocols are supposed to work (like how common data structures are converted to/from a packet format). RPC libraries also allow for portability between machines. Basically we care about the logic (just that we send/receive RPCs) and not the implementation (including differences in hardware, language, or data types).

Implementation

Process is as follows:

  • Client-side stub and RPC library will marshal (convert) the call and arguments into a network format and send the packet.
  • Then, the server-side will receive the packet, un-marshal packet, and call the remote machine’s handler function.

The RPC will typically need the server location to be resolved already (in Go, it requires the server name as an argument to the Dial function). The library will match requests with responses.

Marshalling is also an open detail that’s up to the implementation.

Failures

As with anything in distributed systems, we should assume the case where RPCs fail. Since the network is generally unreliable, and nodes can crash and reboot, then we run into the case where if a response doesn’t arrive after a client request, we don’t know if the server has executed the request or not.

In the case of a request failure, the server didn’t execute the request. For a response failure, the server executed the request but the client didn’t get a response. And for a server failure, the server may or may not have executed the request (who knows!).

A best-effort RPC just keeps retrying if we get no response until a set timeout or maximum number of tries. The obvious problem with this is that we may execute multiple actions on the server (think a banking system, imagine you send debit requests for $10 but you could theoretically end up debiting more than that). This works best with read-only operations with no side effects, and if the application handles duplicate, reordered requests. If not, it may execute requests multiple time or out of order. This is also called at-least-once semantics.

At-most-once RPC semantics just assume that the request may or may not have executed. We don’t send multiple requests from the client. The client still resends a request when no answer, but the server will do duplicate detection with a unique identifier.

server crash + restart. two options

  • keep array of requests in memory, but use an epoch number. reject if client epoch number is too diff
  • keep array on disk. need to ensure it’s written atomically

Exactly once RPC allows the client to keep retrying forever if no response. Then, on the server side, it has to perform duplicate detection, and handle server crashes or use a fault tolerant service.

Language-specific

In Go

Go’s RPC package is provided in the standard library as net/rpc. It’s designed with first-class support primarily for TCP and IP networking. It also supports HTTP connections.