-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust: Data flow improvements to unlock flow in sqlx test #18291
Conversation
1000 + i | ||
} | ||
|
||
fn sink(s: i64) { |
Check notice
Code scanning / CodeQL
Unused variable Note test
@@ -0,0 +1,81 @@ | |||
// Taint tests for strings | |||
|
|||
fn source(i: i64) -> String { |
Check notice
Code scanning / CodeQL
Unused variable Note test
"source" | ||
} | ||
|
||
fn sink_slice(s: &str) { |
Check notice
Code scanning / CodeQL
Unused variable Note test
println!("{}", s); | ||
} | ||
|
||
fn sink(s: String) { |
Check notice
Code scanning / CodeQL
Unused variable Note test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks valuable, a few points to discuss.
@@ -46,6 +46,8 @@ module RustTaintTracking implements InputSig<Location, RustDataFlow> { | |||
RustDataFlow::readStep(pred, cs, succ) and | |||
cs.getContent() instanceof ArrayElementContent | |||
) | |||
or | |||
pred.asExpr() = succ.asExpr().(RefExprCfgNode).getExpr() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking my understanding: when you take a reference &foo
you get data flow from f
to the ReferenceContent
of &f
and you get taint flow from f
to &f
without content?
What sorts of cases do we need the contentless taint flow for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is right. I added the taint flow to support this line in the SQL injection test:
let unsafe_query_1 = String::from("SELECT * FROM people WHERE firstname='") + &remote_string + "'";
Here remote_string
is tainted, and the extra taint step makes unsafe_query_1
tainted at well. One could argue that the reference itself isn't really tainted, but on the other hand the only thing it can be used for is access tainted data and it seemed like a simple way to unlock some additional flow. Alternatively, we could also extend the handling of +
to read ReferenceContent
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intuition is that having +
read the ReferenceContent
is more accurate but ... I'm worried this will be a can of worms if we got this way. So I guess we should probably leave it the way it is.
@hvitved do you have an opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modelling store steps as also taint steps has proven bad in the past, so I think it would be better to provide a taint flow summary for +
which pops ReferenceContent
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the best way to do that for a built-in operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we should revert this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modelling store steps as also taint steps has proven bad in the past
Re. this, we also do that right now for arrays (which was inspired by Ruby). Do we want to remove that as well (later)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully we only add taint steps for reads out of arrays, and not for stores into arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. Got it, taint steps for read steps are fine, but taint steps for store steps are not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we should revert this.
Done 👍
let b = &mut a; | ||
sink(*b); | ||
*b = source(37); | ||
sink(*b); // $ MISSING: hasValueFlow=37 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write_through_borrow
was a tough test, but I'm surprised we don't get this one. Do you know what's missing? Is it that for *
we have a readStep
but this case is storing into it? In Swift I think there was some magic that made this kind of thing work on the left side of assignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it that for
*
we have areadStep
but this case is storing into it?
Yes, I think that's it. We'll need to add a case for assignment statements with a *
on the left hand side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, and it should be robust enough to handle stuff like (*foo).bar = source()
(or whatever the correct syntax would be).
@@ -61,7 +61,7 @@ async fn test_reqwest() -> Result<(), reqwest::Error> { | |||
sink(remote_string1); // $ MISSING: hasTaintFlow | |||
|
|||
let remote_string2 = reqwest::blocking::get("http://example.com/").unwrap().text().unwrap(); // $ Alert[rust/summary/taint-sources] | |||
sink(remote_string2); // $ MISSING: hasTaintFlow | |||
sink(remote_string2); // $ hasTaintFlow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic!
x = 2; // $ write_access=x | ||
print_i64_ref(&x); // $ access=x | ||
print_i64_ref(&x); // $ read_access=x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if print_i64_ref
took a mutable reference and wrote to it, would we still label it a read_access
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It's sort of inaccurate. But I think that in order for read steps from foo
to &foo
to work the SSA library needs to treat &foo
as a read. At least, from what I can see, it seems like the simplest and most straightforward way to handle &
and *
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'd like to hear @hvitved 's opinion on this point as well, I'm not really sure what other languages do for this and why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my earlier comment.
@@ -341,14 +341,14 @@ fn add_assign() { | |||
let mut a = 0; // a | |||
a += 1; // $ access=a | |||
print_i64(a); // $ read_access=a | |||
(&mut a).add_assign(10); // $ access=a | |||
(&mut a).add_assign(10); // $ read_access=a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this is writing to a
. Same for a few of the other cases.
Since I'm away for Christmas I'd better say: I'll be happy for this to be merged once the open conversations have been concluded (and I consider "we'll deal with this later" an acceptable conclusion). Don't wait for my final 👍 if someone else wants to approve. |
@@ -712,6 +712,11 @@ private class CapturedVariableContent extends Content, TCapturedVariableContent | |||
override string toString() { result = "captured " + v } | |||
} | |||
|
|||
/** A value refered to by a reference. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
referred
@@ -46,6 +46,8 @@ module RustTaintTracking implements InputSig<Location, RustDataFlow> { | |||
RustDataFlow::readStep(pred, cs, succ) and | |||
cs.getContent() instanceof ArrayElementContent | |||
) | |||
or | |||
pred.asExpr() = succ.asExpr().(RefExprCfgNode).getExpr() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modelling store steps as also taint steps has proven bad in the past, so I think it would be better to provide a taint flow summary for +
which pops ReferenceContent
.
@@ -484,7 +484,6 @@ module Impl { | |||
class VariableReadAccess extends VariableAccess { | |||
VariableReadAccess() { | |||
not this instanceof VariableWriteAccess and | |||
not this = any(RefExpr re).getExpr() and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be better to only consider these reads for the SSA library. Should be enough to change certain = false
to certain = true
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I had to also handle RefExpr
in variableReadActual
.
x = 2; // $ write_access=x | ||
print_i64_ref(&x); // $ access=x | ||
print_i64_ref(&x); // $ read_access=x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my earlier comment.
pack: codeql/rust-all | ||
extensible: summaryModel | ||
data: | ||
- ["repo:https://github.com/seanmonstar/reqwest:reqwest", "<crate::blocking::response::Response>::text", "Argument[self]", "ReturnValue", "taint", "manual"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it should be ReturnValue.Variant[crate::result::Result::Ok(0)]
.
- ["lang:core", "<crate::option::Option>::unwrap", "Argument[self].Variant[crate::option::Option::Some(0)]", "ReturnValue", "value", "manual"] | ||
- ["lang:core", "<crate::option::Option>::unwrap", "Argument[self]", "ReturnValue", "taint", "manual"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these taint
models should not be needed after altering the summary above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved one of them. But some of our sources specify taint on the entire Result
, so I think I'd be fine to keep the others until that is no longer the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather that we remove these lines, and not have flow for now, we should soon be able to have it once #18298 lands. Otherwise I fear we forget to remove these lines.
dfa3b82
to
dc68260
Compare
dc68260
to
c1e2197
Compare
Except from the |
@@ -46,6 +46,8 @@ module RustTaintTracking implements InputSig<Location, RustDataFlow> { | |||
RustDataFlow::readStep(pred, cs, succ) and | |||
cs.getContent() instanceof ArrayElementContent | |||
) | |||
or | |||
pred.asExpr() = succ.asExpr().(RefExprCfgNode).getExpr() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we should revert this.
- ["lang:core", "<crate::option::Option>::unwrap", "Argument[self].Variant[crate::option::Option::Some(0)]", "ReturnValue", "value", "manual"] | ||
- ["lang:core", "<crate::option::Option>::unwrap", "Argument[self]", "ReturnValue", "taint", "manual"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather that we remove these lines, and not have flow for now, we should soon be able to have it once #18298 lands. Otherwise I fear we forget to remove these lines.
&
as stores, and*
as reads.