Sockets are a type of inter-process communication that allows processes in different physical machines to communicate with each other, typically over a network.

There are four main steps to use sockets as a server:

  • socket: creates the socket.
  • bind: attaches the socket to some location (i.e., a file, IP with port, etc).
  • listen: indicate that we’re accepting connections, and also set the queue limit (before connections start dropping).
  • accept: return the next incoming connection for us to handle.

As a client, we only need two steps:

  • socket: creates the socket.
  • connect: connects to some location, and the socket can now send/receive data.

Basics

Stream (SOCK_STREAM) sockets use TCP, where all data sent by a client appears in the same order on the server. They form a persistent connection between the client and server. They’re reliable, but may be slow. Telnet and SSH use stream sockets.

Datagram (SOCK_DGRAM) sockets use UDP. It sends messages between the client and server without a persistent connection (i.e., they’re “connectionless”). They’re fast but messages may be reordered or dropped altogether.

In UNIX systems

Main header to import:

#include <sys/socket.h> // socket syscalls and structs
#include <sys/types.h> // ssize_t

Sockets define several types (mainly structs) to use:

  • int describes socket descriptors (much like file descriptors).
  • struct addrinfo — used to prepare socket address structures.
  • struct sockaddr — holds the socket address information.

We have a few relevant syscalls:

  • int getaddressinfo() — fills the structs out.
  • int socket(int domain, int type, int protocol) — opens a
    • domain — general protocol, further specified with the protocol input (which is mostly unused).
      • AF_UNIX — for local communication on the same physical machine.
      • AF_INET — for IPv4 protocol using your network interface.
      • AF_INET6 — for IPv6, similarly with network interface.
    • type — either stream or datagram sockets, SOCK_STREAM.
    • protocol — specifies whether to use TCP or UDP. By default, we can set to 0 and the syscall can reliably infer which to use from the type. We can also pass in:
      • getprotobyname("udp")->p_proto
  • int bind(int socket, const struct sockaddr *address, socklen_t address_len) — used to bind the socket file descriptor to a given port. Should only be used on the server, not the client.
    • socket — file descriptor returned from the socket syscall.
    • address — a sockaddr structure1
      • struct sockaddr_un — for local communication, i.e., just a path.
      • struct sockaddr_in — for IPv4.
      • struct sockaddr_in6 — for IPv6
  • int listen(int socket, int backlog)
    • backlog — limit of outstanding connections. This queue is managed by the kernel.
  • int accept(int socket, struct sockaddr *restrict address, socklen_t *restrict address_len) — a blocking syscall until a new connection.
    • address and address_len: locations to write the connecting address. Acts as an optional return value. We can set to NULL to ignore.
    • Returns: a new file descriptor, which we can read or write to.
  • int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen)
    • sockfd — file descriptor returned by the socket syscall. Client needs to be using the same protocol and type as the server.

Resources

  • Beej’s Guide to Network Programming, by Brian Hall

Footnotes

  1. “Computers have less manners than people” - Prof Eyolfson