Issue
I have a List<Films> films
, where each film has int id
and String description
.
My task is to map each word from all the descriptions to all films names that contain the word in the description, it has to be something like that:
<word1>: <filmId11>, <filmId12>,..., <filmId1N>
<word2>: <filmId21>, <filmId22>, ..., <filmId2N>
...
I did it using Java Stream API:
private List<Map.Entry<String, String>> wordToFilmIds;
private void addWordsFromDescriptions(List<Film> films) {
for (Film film : films) {
String description = film.description();
String[] tokens = description.split("[\\p{IsPunctuation}\\p{IsWhite_Space}]+");
allWords.addAll(Arrays.stream(tokens).toList());
}
}
private void mapWordsToFilmIDs(List<Films> films) {
wordToFilmIds = allWords.stream()
.map(word -> Map.entry(word,
films.stream()
.filter(film -> film.description().contains(word))
.map(film -> String.valueOf(film.id()))
.collect(Collectors.joining(","))))
.toList();
}
But the problem is my solution is too slow and I have to work with big numbers, the film’s count is about 12 000 and the descriptions are not short. Also, I am not
permitted to use multi-threading
.
Any idea how can I optimise it?
Right now the program does not finish.
I also tried using parallel streams
, still, it was not working.
Solution
I think that the fact that you are iterating over each film for every word makes the solution O(n^2). It is doable with one iteration though:
Given the helper class:
public class Tuple<A,B> {
public A a;
public B b;
public Tuple(A a, B b) {
this.a = a;
this.b = b;
}
}
Try this:
Map<String, Set<Integer>> addWordsFromDescriptions(List<Film> films) {
return films.stream()
.flatMap(film -> tokenizeDescription(film).map(token -> new Tuple<>(token, film)))
.collect(Collectors.groupingBy(
tuple -> tuple.a,
Collectors.mapping(tuple -> tuple.b.id(), Collectors.toSet())
));
}
private Stream<String> tokenizeDescription(Film film) {
return Stream.of(film.description().split("[\\p{IsPunctuation}\\p{IsWhite_Space}]+"));
}
Given the Map<String, Set<Integer>>
, you can join the ids in the set and get the string you want.
Answered By – Nikos Paraskevopoulos
Answer Checked By – Senaida (BugsFixing Volunteer)