Collocations: using collocation dictionaries and AI
When choosing collocations to teach, I often used references like Oxford Collocations . Now AI offers help, but is it any better? And are collocations actually what we should search for?

In recent times I have done less of this kind of searching for a variety of reasons. Firstly, I tend to do this more when I am writing coursebooks where I am maybe a bit more considered in my choices. In terms of class, I think you get better at using your ‘intuition’ if you have regularly tested yourself and done research into collocations. And thirdly, I have somewhat changed my attitude away from a simple focus on collocation towards thinking more about the networks of words around a word or collocation.
Having said this in recent times I have started using some AI tools (in the form of Large Language Model search engines like ChatGpt or Deep Seek) to research collocations for our Spanish classes, where I need more support again. This made me wonder about how well the new technology stood up to the old (the collocation dictionary) in English. You might think there is no competition, but I would encourage you to read on. It’s not a open and shut case! Furthermore, if you do use the LLM search engine, there are also lessons to be learned about the prompts you use.
Collocations of adjectives
One thing I always found a bit frustrating with Oxford Collocations is that the listings under an adjective only have entries for adverb collocates plus some linking verbs (see image below). I should say I have not used the new online version, so it may have changed from the paper version.
I understand why the dictionary may not include this information. They are trying to reduce the size of the dictionary and having a list of nouns would be a repetition because complex is listed (for example) under the entries for these nouns. Ultimately, it might be argued that our meanings, as we speak, are more driven by the noun and then the adjective modification of the noun, rather than the other way round. It therefore makes sense to restrict the listing in this way. Furthermore, if we take the noun ‘relationship’, complex collocates strongly when we are talking about the relationship between two things/processes but not when talking about the relationship between people, and these distinctions of meaning may be lost by simply listing the nouns that complex collocates with. Still, as a teacher, if a student asks for the meaning of an adjective I do want to be able to give typical NOUN collocates and as this is for teachers and learners, it remains a frustration.
When you ask ChatGPT for 6 common collocations with complex, that’s exactly what you get.: a list of nouns. It gives a complex … problem, issue, system, process, structure, relationship. Note that these are not necessarily the most common collocations, even if that is in the prompt because ChatGpt acknowledges the need for corpora analysis to confirm relative frequency. If you ask deep seek for example the list may vary.
Benefits of collocation dictionaries: nouns

Where a dictionary may be better is in its listings for nouns. Let’s take the example from the Oxford Collocation dictionary for the two meanings of the noun complex. If you ask for collocations of the noun complex with AI, what it gives you are the list of adjectives that go with the psychological meaning of complex. Of course, we could specify that we want the building meaning, but then ChatGPT generally produces a list of the types of complex (residential, sports, factory, etc) rather than the adjectives to describe its size, which is in the dictionary listing. In neither case, will the AI ask for verb or preposition collocates that are viewed here. Of course we can also ask for these but if we want this same information from AI we need to ask … give me a range of different collocations – adjective, verb, preposition, adverbs of the chosen adjectives – that go with the noun X (meaning Y) and the same for the noun (meaning Z) and limit the total number of collocations to a number. In this case, we might get something close to the dictionary, but then you kind of have to ask yourself what was the point of the AI and by using the dictionary I might be a bit more eco-friendly (AI searches are estimated – at least in this article – to be 30 times more energy intensive than the traditional search engine).
Is collocation actually what we should be prompting for?
Having said all this you might want to consider whether collocations are what we really want to be asking for – at least in terms of how collocation is defined by the dictionary or AI. At lower levels, what students need to make use of words, is often word-combinations that will not strictly be considered as collocations. For example, if students learn the word coffee, they really need things like: (do you) want a coffee / (shall we) have a coffee / I need a coffee / (can you/forgot to) buy/get coffee, a cup of coffee / (It’s a bit) expensive / (I like it/it’s very/quite) strong / what kind? / (not very) nice / (do you want) milk? / (I / I will) have it black / (do you) take sugar/ (Is it) fresh etc. These might be described as things people say, examples or grammaticalized lexis rather than collocations. However, this is closer to what students will want to say and involves more frequent useful language for their level than the list of collocations Ai produces which include brew, instant, sip and stain.
If we think about teaching complex at higher levels (the Oxford advanced dictionary gives the adjective a B1 rating, the buildings noun a B2 rating and the psychological condition a C2 rating) we might also apply the same prompt to these other words: give me ten things you would say using the word complex meaning building. What’s interesting is that it reveals somewhat different ‘collocates’. So for example we can see that the building complex now collocates with includes/features (a whole host of facilities / a cinema, a skate park, a swimming pool), covers (several acres / the area of 100 football pitches) used to be a military base before being converted into a museum. The complex can be located somewhere (e.g. on the outskirts of town) or something else (an office / a pool / an apartment) might be located within the complex. Of course you as the teacher might need to draw attention to these word combinations – perhaps by creating a gap fill task (or getting the AI to do it for you).

I am not saying that these are all the right collocations to teach, but it’s interesting to note how these examples differ from the prompt for collocation. I think they often are graded better to intermediate levels (and below). Also by being rooted in what people may say. these examples might make it easier for teachers think of questions that get students to consider usage or to talk about examples from their own lives,
I couldn’t agree more, as is usually the case. I find that ChatGPT—and to an even greater extent, Perplexity—do a reasonably good job when asked to look for co-text (moving beyond mere collocations). As you say, intuition, expertise, a strong command of the language, and practice are the best GenAI tools—meaning yourself. However, it’s also fair to say that a significant number of language teachers are not in that fortunate position. So, GenAI tools can come to their rescue at certain points in a lesson, helping them explore lexis in a principled way when needed.
Yes, as I mentioned, part of my interest in the LLM search engines is that I have been teaching Spanish and my level is B2/C1. I have definitely found it of use, particularly in the absences of a collocations dictionary for Spanish.
Actually there is one. A rare thing. I have it in paper. My students brought it for me from Spain about 20 years ago
As a non-native teacher of English I totally agree. I use AI for reformulation of what my students have said in class, based on my notes. I always ask the AI to rewrite the students’ sentences in “lexically rich, spoken British English at C1 level”, which usually produces the desired result. Although sometimes some manual tweaking is necessary. I learn a lot that way, too.
Thanks for the comment and an interesting prompt. I’d be interested to see the changes it makes.
Very thought-provoking, Andrew.
Perhaps collocations – in the strict sense of the word – aren’t crucial for acquiring a word like “coffee” and, instead, what’s necessary is ‘grammaticalised lexis’ to make use it. But for many words, collocational knowledge is key – whether it’s commonly confused pairs (make/do), abstract verbs with elusive meanings (pursue, seek) or words with highly restricted usage (pungent, mitigating). Not to mention that collocational knowledge is essential for fluency.
I happened to read this on the same day as an article by Mark Davies – Mr. Corpus himself (BYU) –who also extolls the virtues of AI vis-a-vis traditional corpora, admitting he was surprised by how good LLMs are! He still emphasizes the unique strengths of corpus data, which is attested, consistent and verifiable/falsifiable (all of the things LLMs are not). Interestingly, he finds LLMs “exceptionally good” at collocations (!). He also makes a point about LLMs being better at categorising than generating data.
As a materials writer, I often turn to AI (ChatGPT, Gemini or, more recently, Claude). But look at this AI-generated assessment task (a list of lexical items including target collocations were provided by me). I’m not sure “exceptionally good” is the collocation (!) I’d use to describe the output:
Add these verbs to the correct groups: bring, put, do, make (4 pts)
1 _______
people together
something together
a decision
2 _______
a reference
a mistake
homework
3 _______
something on a label
together
in stages
4 _______
knowledge
goods
a performance
Make of it what you will. But, of course, the AI output is only as good as the prompt… so there’s that 🙂
Hi Leo,
Thanks for the detailed and interesting comment. I’m increasingly thinking that within a lexical approach we should be differentiating more between different levels and how you apply a lexical focus to our teaching. I also think it might be worth distinguishing between how we ‘present’ and ‘practise’ vocabulary in the first instance and how we revise/deepen students knowledge. As you say, at the low levels (A1/A2), I would be emphasising grammaticalised lexis above all (especially receptively). Where we explore collocation it as much to do with recycling the individual words as ensuring accuracy. At intermediate levels, as I say I would be thinking more about starting with an outcome and working backwards to a selection of collocations for that outcome or starting with a collocation an thinking about how that might be ‘texted’ or the story that would come out of it. The kind of deeper exploration of the varied collocates of individual words and the unusual words that collocate narrowly, seem to me more something for revision, self-study and advanced classes. Underlying, these ideas is also an issue around how highly we value accuracy at different levels. To my mind, accuracy – whether grammatical or collocational – is just not an appropriate global goal for a low level student, because their communication will be inaccurate no matter what we do. I’ll probably articulate this more fully in future blog posts, but I hope you get my drift.