Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
797 views
in Technique[技术] by (71.8m points)

mongodb - Sorting by maximum array field, ascending or descending

In my Meteor app, I have a collection of documents with an array of subdocuments that look like this:

/* 1 */
{
    "_id" : "5xF9iDTj3reLDKNHh",
    "name" : "Lorem ipsum",
    "revisions" : [ 
        {
            "number" : 0,
            "comment" : "Dolor sit amet",
            "created" : ISODate("2016-02-11T01:22:45.588Z")
        }
    ],
    "number" : 1
}

/* 2 */
{
    "_id" : "qTF8kEphNoB3eTNRA",
    "name" : "Consecitur quinam",
    "revisions" : [ 
        {
             "comment" : "Hoste ad poderiquem",
             "number" : 1,
             "created" : ISODate("2016-02-11T23:25:46.033Z")
        }, 
        {
            "number" : 0,
            "comment" : "Fagor questibilus",
            "created" : ISODate("2016-02-11T01:22:45.588Z")
        }
    ],
    "number" : 2
}

What I want to do is query this collection and sort the result set by the maximum date in the created field of the revisions array. Something I haven't been able to pull off yet. Some constraints I have are:

  • Just sorting by revisions.created doesn't cut it, because the date used from the collection depends on the sort direction. I have to use the maximum date in the set regardless of sort order.
  • I cannot rely on post-query manipulation of an unsorted result set, so, this must be done by a proper query or aggregation by the database.
  • There's no guarantee that the revisions array will be pre-sorted.
  • There may be extra fields in some documents and those have to come along, so careful with $project.
  • Meteor is still using MongoDB 2.6, newer API features are no good :(
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The basic problem with what you are asking here comes down to the fact that the data in question is within an "array", and therefore there are some basic assumptions made by MongoDB as to how this gets handled.

If you applied a sort in "descending order", then MongoDB will do exactly what you ask and sort the documents by the "largest" value of the specified field within the array:

.sort({ "revisions.created": -1 ))

But if instead you sort in "ascending" order then of course the reverse is true and the "smallest" value is considered.

.sort({ "revisions.created": 1 })

So the only way of doing this means working out which is the maximum date from the data in the array, and then sorting on that result. This basically means applying .aggregate(), which for meteor is a server side operation, being unfortunately something like this:

Collection.aggregate([
    { "$unwind": "$revisions" },
    { "$group": {
        "_id": "$_id",
        "name": { "$first": "$name" },
        "revisions": { "$push": "$revisions" },
        "number": { "$first": "$number" }
        "maxDate": { "$max": "$revisions.created" }
    }},
    { "$sort": { "maxDate": 1 }
])

Or at best with MongoDB 3.2, where $max can be applied directly to an array expression:

Collection.aggregate([
    { "$project": {
        "name": 1,
        "revisions": 1,
        "number": 1,
        "maxDate": {
            "$max": {
                "$map": {
                    "input": "$revisions",
                    "as": "el",
                    "in": "$$el.created"
                }
            }
        }
    }},
    { "$sort": { "maxDate": 1 } }
])

But really both are not that great, even if the MongoDB 3.2 approach has way less overhead than what is available to prior versions, it's still not as good as you can get in terms of performance due to the need to pass through the data and work out the value to sort on.

So for best performance, "always" keep such data you are going to need "outside" of the array. For this there is the $max "update" operator, which will only replace a value within the document "if" the provided value is "greater than" the existing value already there. i.e:

Collection.update(
    { "_id": "qTF8kEphNoB3eTNRA" },
    { 
        "$push": {
            "revisions": { "created": new Date("2016-02-01") }            
        },
        "$max": { "maxDate": new Date("2016-02-01") }
    }
)

This means that the value you want will "always" be already present within the document with the expected value, so it is just now a simple matter of sorting on that field:

.sort({ "maxDate": 1 })

So for my money, I would go though the existing data with either of the .aggregate() statements available, and use those results to update each doccument to contain a "maxDate" field. Then change the coding of all additions and revisions of array data to apply that $max "update" on every change.

Having a solid field rather than a calculation always makes much more sense if you are using it often enough. And the maintenance is quite simple.


In any case, considering the above applied example date, which is "less than" the other maximum dates present would return for me in all forms:

{
        "_id" : "5xF9iDTj3reLDKNHh",
        "name" : "Lorem ipsum",
        "revisions" : [
                {
                        "number" : 0,
                        "comment" : "Dolor sit amet",
                        "created" : ISODate("2016-02-11T01:22:45.588Z")
                }
        ],
        "number" : 1,
        "maxDate" : ISODate("2016-02-11T01:22:45.588Z")
}
{
        "_id" : "qTF8kEphNoB3eTNRA",
        "name" : "Consecitur quinam",
        "revisions" : [
                {
                        "comment" : "Hoste ad poderiquem",
                        "number" : 1,
                        "created" : ISODate("2016-02-11T23:25:46.033Z")
                },
                {
                        "number" : 0,
                        "comment" : "Fagor questibilus",
                        "created" : ISODate("2016-02-11T01:22:45.588Z")
                },
                {
                        "created" : ISODate("2016-02-01T00:00:00Z")
                }
        ],
        "number" : 2,
        "maxDate" : ISODate("2016-02-11T23:25:46.033Z")
}

Which correctly places the first document at the top of the sort order with consideration to the "maxDate".


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...