Recognition of intentions is an unconscious cognitive process vital to human communication. This skill enables anticipation and increases interactive exchanges quality between humans. Within the context of engagement, i.e. intention for interaction, non-verbal signals are used to communicate this intention to the partner. In this paper, we investigated methods to detect these signals in order to allow a robot to know when it is about to be addressed. Classically, the human position and speed, the human-robot distance are used to detect the engagement. Our hypothesis is that this method is not enough in a context of a home environment. The chosen approach integrates multimodal features gathered using a robot enhanced with a Kinect. The evaluation of this new method of detection on our corpus collected in spontaneous conditions highlights its robustness and validates use of such technique in real environment. Experimental validation shows that the use of multimodal sensors gives better precision and recall than the detector using only spatial and speed features. We also demonstrate that 7 multimodal features are sufﬁcient to provide a good engagement detection score.