Issue
I am translating some things from SQL into MongoDB.
I have a Mongo table set up where a document can contain lots of information. There are two ids, id_1
and id_2
. id_2
has a default value of -1
. There is a 1->n relationship between id_1
and id_2
. For instance, the data can look like:
id_1 id_2 info
---- | ---- | ----
120 -1 'dont'
120 444 'show'
123 -1 'test'
124 -1 'hello'
125 -1 'world'
125 123 'oh wait'
126 -1 'help'
126 201 'me'
127 -1 'sql'
127 206 'hell'
I want to have a MongoDB query that gets the highest id_2
associated with an id_1
.
Here is what the answer should look like given id_1
containing (123,124,125,126,127)
and id_2
containing (-1,-1,123,201,206)
:
id_1 id_2 info
---- | ---- | ----
123 -1 'test'
124 -1 'hello'
125 123 'oh wait'
126 201 'me'
127 206 'hell'
In SQL this could be done using the following:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id_1 ORDER BY id_2 DESC) rn
FROM ids
WHERE id_1 IN (123, 124, 125, 126, 127) AND
id_2 IN (-1, -1, 123, 201, 206)
)
SELECT id_1, id_2, info
FROM cte
WHERE rn = 1;
In Mongo this can be done with a $group
clause, however, it is very slow. See below:
{
'$sort' : {
'id_1': 1,
'id_2': 1
}
},
{
'$group' : {
'_id': '$id_1',
'id_1': {'$first': '$id_1'},
'info': { '$last': '$info'}
}
}
I found this in the documentation:
However, I’m not getting good results. I think I’m misunderstanding window
. Here is what I have:
{
'$match' : {
'id_1': {'$in' : [123,124,125,126,127]},
'id_2': {'$in' : [-1,-1,123,201,206]}
}
},
{
'$setWindowFields': {
'partitionBy': 'id_1',
'sortBy' : {
'id_2': -1
},
'output': {
'info': {
'$last': '$info'
},
}
}
},
{
'$project' : {
'id_1' : 1,
'id_2' : 1,
'info' : 1
}
}
This doesn’t really seem to do anything except output every info for every combination of id_1
and id_2
. Similarly, adding a range of [0,1]
to a window
in the output
just results in an error that says:
Missing _sortExpr with range-based bounds
Does anyone know how to get the same results that I got in SQL?
Solution
$match
matchid_1
andid_2
$setWindowFields
unbounded
check the whole group by partition$match
keep only id_2 = max, means it’s the largest document.$unset
remove max since it’s unnecessary
db.collection.aggregate([
{
"$match": {
"id_1": { "$in": [ 123, 124, 125, 126, 127 ] },
"id_2": { "$in": [ -1, 123, 201, 206 ] }
}
},
{
$setWindowFields: {
partitionBy: "$id_1",
sortBy: { id_2: 1 },
output: {
max: {
$max: "$id_2",
window: {
documents: [ "unbounded", "unbounded" ]
}
}
}
}
},
{
"$match": {
$expr: { "$eq": [ "$id_2", "$max" ] }
}
},
{
"$unset": "max"
}
])
Answered By – YuTing
Answer Checked By – Mary Flores (BugsFixing Volunteer)