[SOLVED] How can I map all words in the description of a Film to all films' names that contain the word in their description fast?


I have a List<Films> films, where each film has int id and String description.
My task is to map each word from all the descriptions to all films names that contain the word in the description, it has to be something like that:

<word1>: <filmId11>, <filmId12>,..., <filmId1N>
<word2>: <filmId21>, <filmId22>, ..., <filmId2N>

I did it using Java Stream API:

private List<Map.Entry<String, String>> wordToFilmIds;

private void addWordsFromDescriptions(List<Film> films) {
        for (Film film : films) {
            String description = film.description();
            String[] tokens = description.split("[\\p{IsPunctuation}\\p{IsWhite_Space}]+");

    private void mapWordsToFilmIDs(List<Films> films) {
        wordToFilmIds = allWords.stream()
                .map(word -> Map.entry(word,
                                .filter(film -> film.description().contains(word))
                                .map(film -> String.valueOf(film.id()))


But the problem is my solution is too slow and I have to work with big numbers, the film’s count is about 12 000 and the descriptions are not short. Also, I am not permitted to use multi-threading.
Any idea how can I optimise it?
Right now the program does not finish.

I also tried using parallel streams, still, it was not working.


I think that the fact that you are iterating over each film for every word makes the solution O(n^2). It is doable with one iteration though:

Given the helper class:

public class Tuple<A,B> {
    public A a;
    public B b;
    public Tuple(A a, B b) {
        this.a = a;
        this.b = b;

Try this:

    Map<String, Set<Integer>> addWordsFromDescriptions(List<Film> films) {
        return films.stream()
                .flatMap(film -> tokenizeDescription(film).map(token -> new Tuple<>(token, film)))
                        tuple -> tuple.a,
                        Collectors.mapping(tuple -> tuple.b.id(), Collectors.toSet())

    private Stream<String> tokenizeDescription(Film film) {
        return Stream.of(film.description().split("[\\p{IsPunctuation}\\p{IsWhite_Space}]+"));

Given the Map<String, Set<Integer>>, you can join the ids in the set and get the string you want.

Answered By – Nikos Paraskevopoulos

