Humans use gestures as a means of non-verbal communication. Often accompanying speech, these gestures have several purposes but in general, aim to convey an intended message to the receiver. Researchers have tried to develop systems to allow embodied agents to be better communicators when interacting with humans via using gestures. In this article, we present a scoping literature review of the methods and the metrics used to generate and evaluate co-speech gestures. After collecting a set of papers using a term search on the Scopus database, we analysed the content of these papers based on methodology (i.e., model, the dataset used), evaluation measures (i.e., objective and subjective) and limitations. The results indicate that data-driven approaches are used more frequently. In terms of evaluation measures, we found a trend of combining objective and subjective metrics, while no standards exist for either. This literature review provides an overview of the research in the area and, more specifically insights the trends and the challenges to be met in building a system to automatically generate gestures for embodied agents.