Per request, here is an outline of our problem, and how we resolved it:
In our system we created a custom Document locking routine (using redis-lock), wherein the following happened in this precise (incorrect) order:
INCORRECT ORDER OF OPERATIONS:
- Client request received
- Document locked
- Document retrieved
- Document edited
- Document unlocked
- Client request resolved
- Document saved
Once you see it written out, the problem is obvious: we were saving our Documents outside our Document lock.
Let's assume #6 takes 100ms in our system. That is a 100ms window wherein if any other requests grabs that same Document, we're going to have a save conflict (the titled error in this Question is basically a save conflict IMHO).
In other words/example: in our system, Request A grabbed Version 1 of Document X, edited it, then unlocked it, but before Request A saved the Document, Request B grabbed Document X and incremented it to Version 2 (read up on Mongo versions for more info about this). Then Request A resolves its Client request and goes to save Document X, but it's trying to save Version 1, and now it sees it has Version 2, and thus the error above.
So the fix is easy. Save your Documents inside your lock. (In the above example, move #7 to before #5. See below.)
CORRECT/FIXED ORDER OF OPERATIONS
- Client request received
- Document locked
- Document retrieved
- Document edited
- Document saved
- Document unlocked
- Client request resolved
(You could make an argument that #6 and #7 should be swapped, but that is outside the scope of Mongo/Mongoose/this question.)
I am going to leave this question un-answered for a while and see if anyone can shed some light on a better way to isolate the relevant code and troubleshoot this issue. In our case, this was a very systemic problem and VERY challenging to troubleshoot for our skill level at the time.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…