-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 4558 from Core CLI: Adding flock
#358
base: dev
Are you sure you want to change the base?
Conversation
bluemix/configuration/persistor.go
Outdated
@@ -44,14 +47,26 @@ func (dp DiskPersistor) Load(data DataInterface) error { | |||
return err | |||
} | |||
|
|||
if err != nil { | |||
err = dp.write(data) | |||
if err != nil { // strange: requiring an error (to allow write attempt to continue), as long as it is not a permission error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange code.
Requiring a non-permission error to actually carry out the writing to the config? Weird!
Please see my OP for my thoughts on why this is benign despite being pretty obviously counterintuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is strange but the docs strongly recommend using TryLock instead due to the potential blocking issue. Can we try that and see if it resolves this issue
Lock is a blocking call to try and take an exclusive file lock. It will wait until it is able to obtain the exclusive file lock. It's recommended that TryLock() be used over this function. This function may block the ability to query the current Locked() or RLocked() status due to a RW-mutex lock. (doc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested yet but I am thinking using the TryLock might fix the issue that you are seeing
bluemix/configuration/persistor.go
Outdated
@@ -44,14 +47,26 @@ func (dp DiskPersistor) Load(data DataInterface) error { | |||
return err | |||
} | |||
|
|||
if err != nil { | |||
err = dp.write(data) | |||
if err != nil { // strange: requiring an error (to allow write attempt to continue), as long as it is not a permission error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is strange but the docs strongly recommend using TryLock instead due to the potential blocking issue. Can we try that and see if it resolves this issue
Lock is a blocking call to try and take an exclusive file lock. It will wait until it is able to obtain the exclusive file lock. It's recommended that TryLock() be used over this function. This function may block the ability to query the current Locked() or RLocked() status due to a RW-mutex lock. (doc)
bluemix/configuration/persistor.go
Outdated
} | ||
return err | ||
} | ||
|
||
func (dp DiskPersistor) lockedWrite(data DataInterface) error { | ||
lockErr := dp.fileLock.Lock() // provide a file lock, in addition to the RW mutex (in calling functions), just while dp.write is called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: For readability can we move the comments above line 57
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
c.onError(err) // NOTE: the error could be triggered by a file-locking issue, | ||
// a file-unlocking issue, OR a file-writing issue; currently, the error chain | ||
// ends in `panic("configuration error: " + err.Error()"`, which is somewhat | ||
// generic but sufficient for all three of these error types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: For readability can we move the comment above line 157.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
If we use this instead, we might need to code in a loop with the following psuedocode, since it is nonblocking and therefore will not wait until the exclusive lock is acquired:
Thoughts @Aerex ? |
Hmm actually lets use TryLockContext which provides a means to retry the lock after a given interval |
Ok. I like this idea. Basically the same as the loop, but with a delay in each iteration. |
Update: I think we should delay the merge based on the recent discovery that |
After significant research on the most efficient, least intrusive, and maximally functionality-preserving way to increase the concurrency-safety of the CLI, we decided to use
flock
in additionsync.RWMutex
. I thought about and trialed this on my system repeatedly before deciding that a simple addition of a file lock to thepersistor
'swrite
method would suffice to address the suspected problem that overly-permissive access allowance to the file for multiple threads, in the read-only logic of the code, caused file corruption.One piece of code that is commented about inline starting with
strange
(and also called out on this PR) is worth highlighting here: it seems that on firstSave
of the config, the code is requiring an error--that is not a permission error--to have occurred to write to the config file. This is very peculiar. My suspicion is that this doesn't matter:Save
is called via aninitOnce
function that ensures only one write attempt to write to the config file is carried out this way. Subsequent attempts to write to the config file will occur viaLoad
, which simply had a call to thepersistor
file'swrite
function.One helpful piece of information to keep in mind is that
DiskPersistor
is simply an object that houses the filepath (and that I've added to, to also contain the file lock) of the config. Additionally, the actual writing to the config file happens via thewrite
function (which is now a callee of thelockedWrite
function I've created), but the data structure (to be written to the config file) is modified, by dot-access, through functions peculiarly calledcb()
.For testing, I think it is sufficient to chain together multiple
ibmcloud ...
commands that involve read/writes to the commit with&
s, e.g.ibmcloud config --http-timeout 60 & ibmcloud config --http-timeout 30 & ibmcloud config --http-timeout 45 & ibmcloud config --list & ibmcloud config --http-timeout 53
. I also tested out the code with a Thread Group containing multiple OS Process Samplers in JMeter, and there was no data corruption.PLEASE NOTE: to test this out with the core CLI's binary, it is important to run the following commands in the local copy of the core CLI repo (
bluemix-cli
) from where you will build a binary for testing, in the following order, to avoid issues with module pathing:go get github.ibm.com/arf/cli-dev-plugin/plugin@<commit-hash>
. The<commit-hash>
at the end of this command string corresponds to a version of thedev
plugin code that uses the same packages as the SDK code in step 2. At the time of writing, that is2a362c8b25d44002dcae8d8b37fcfae54dd94d17
go get github.com/IBM-Cloud/ibm-cloud-cli-sdk@<commit-hash>
. The<commit-hash>
at the end of this command string corresponds to a version of the code in this repo that uses the same packages as the dev-plugin's code in step 1. At the time of writing, that is1d5f40b5afd04087c3f803e24561a28cc16ebefe
.go mod tidy
EDIT: Regarding testing, I think I will make a ruby script that hands off to and/or interweaves with a bash script for multithreaded modifications of the config. I think this can happen separately, as the PR is open for review, and be merged later.