Graph attention networks (GATs) are a type of graph neural network architecture that rely on attention mechanisms to focus on important nodes. GATs learn an attention score between two nodes, i.e., the contribution weight of neighbour nodes, then aggregate the neighbour features using attention coefficients (scalar importance value for each neighbour node), then use multi-head attention (i.e., different attention mechanisms) to capture different aspects of the graph.

These three steps are represented numerically by the following. First a shared neural network computes an attention score between two nodes:

Then the attention scores are normalised with softmax:

Then the node embeddings are updated based on the attention score: