In a Turing test, a judge decides whether their conversation partner is either a machine or human. What cues does the judge use to determine this? In particular, are presumably unique features of human language actually perceived as humanlike? Participants rated the humanness of a set of sentences that were manipulated for grammatical construction: linear right-branching or hierarchical center-embedded and their plausibility with regard to world knowledge.
We found that center-embedded sentences are perceived as less humanlike than right-branching sentences and more plausible sentences are regarded as more humanlike. However, the effect of plausibility of the sentence on perceived humanness is smaller for center-embedded sentences than for right-branching sentences.
Participants also rated a conversation with either correct or incorrect use of the context by the agent. No effect of context use was found. Also, participants rated a full transcript of either a real human or a real chatbot, and we found that chatbots were reliably perceived as less humanlike than real humans, in line with our expectation. We did, however, find individual differences between chatbots and humans.