[SOLVED] MongoDB partition window, getting the document where a field has the greatest value

Issue

I am translating some things from SQL into MongoDB.

I have a Mongo table set up where a document can contain lots of information. There are two ids, id_1 and id_2. id_2 has a default value of -1. There is a 1->n relationship between id_1 and id_2. For instance, the data can look like:

id_1   id_2   info
---- | ---- | ----
120   -1      'dont'
120   444     'show'
123   -1      'test'
124   -1      'hello'
125   -1      'world'
125   123     'oh wait'
126   -1      'help'
126   201     'me'
127   -1      'sql'
127   206     'hell'

I want to have a MongoDB query that gets the highest id_2 associated with an id_1.

Here is what the answer should look like given id_1 containing (123,124,125,126,127) and id_2 containing (-1,-1,123,201,206):

id_1   id_2   info
---- | ---- | ----
123    -1     'test'
124    -1     'hello'
125    123    'oh wait'
126    201    'me'
127    206    'hell' 

In SQL this could be done using the following:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY id_1 ORDER BY id_2 DESC) rn
    FROM ids
    WHERE id_1 IN (123, 124, 125, 126, 127) AND
          id_2 IN (-1, -1, 123, 201, 206)
)

SELECT id_1, id_2, info
FROM cte
WHERE rn = 1;

In Mongo this can be done with a $group clause, however, it is very slow. See below:

{
      '$sort' : {
           'id_1': 1,
           'id_2': 1   
      }  
},
{
      '$group' : {
          '_id': '$id_1',
          'id_1': {'$first': '$id_1'},
          'info': { '$last': '$info'}
      }  
}

I found this in the documentation:

https://docs.mongodb.com/manual/reference/operator/aggregation/setWindowFields/#std-label-setWindowFields-window

However, I’m not getting good results. I think I’m misunderstanding window. Here is what I have:

    {
            '$match' : {
                'id_1': {'$in' : [123,124,125,126,127]},
                'id_2': {'$in' : [-1,-1,123,201,206]}
            }
    },
    {
        '$setWindowFields': {
            'partitionBy': 'id_1',
            'sortBy' : {
                'id_2': -1
            },
            'output': {
                'info': {
                    '$last': '$info'    
                },
            }
        }
    },
    {
        '$project' : {
            'id_1' : 1,
            'id_2' : 1,
            'info' : 1
        }
    }

This doesn’t really seem to do anything except output every info for every combination of id_1 and id_2. Similarly, adding a range of [0,1] to a window in the output just results in an error that says:

Missing _sortExpr with range-based bounds

Does anyone know how to get the same results that I got in SQL?

Solution

  • $match match id_1 and id_2
  • $setWindowFields unbounded check the whole group by partition
  • $match keep only id_2 = max, means it’s the largest document.
  • $unset remove max since it’s unnecessary
db.collection.aggregate([
  {
    "$match": {
      "id_1": { "$in": [ 123, 124, 125, 126, 127 ] },
      "id_2": { "$in": [ -1, 123, 201, 206 ] }
    }
  },
  {
    $setWindowFields: {
      partitionBy: "$id_1",
      sortBy: { id_2: 1 },
      output: {
        max: {
          $max: "$id_2",
          window: {
            documents: [ "unbounded", "unbounded" ]
          }
        }
      }
    }
  },
  {
    "$match": {
      $expr: { "$eq": [ "$id_2", "$max" ] }
    }
  },
  {
    "$unset": "max"
  }
])

mongoplayground

Answered By – YuTing

Answer Checked By – Mary Flores (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *