Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust: Data flow improvements to unlock flow in sqlx test #18291

Merged
merged 9 commits into from
Dec 18, 2024

Conversation

paldepind
Copy link
Contributor

  • Adds some more data flow tests.
  • Add a content type for references, models & as stores, and * as reads.
  • Adds a few MaD models to unlock flow in the sqlx SQL injection test.

@github-actions github-actions bot added the Rust Pull requests that update Rust code label Dec 16, 2024
1000 + i
}

fn sink(s: i64) {

Check notice

Code scanning / CodeQL

Unused variable Note test

Variable 's' is not used.
@@ -0,0 +1,81 @@
// Taint tests for strings

fn source(i: i64) -> String {

Check notice

Code scanning / CodeQL

Unused variable Note test

Variable 'i' is not used.
"source"
}

fn sink_slice(s: &str) {

Check notice

Code scanning / CodeQL

Unused variable Note test

Variable 's' is not used.
println!("{}", s);
}

fn sink(s: String) {

Check notice

Code scanning / CodeQL

Unused variable Note test

Variable 's' is not used.
@paldepind paldepind marked this pull request as ready for review December 16, 2024 11:35
@paldepind paldepind requested review from hvitved and geoffw0 December 16, 2024 11:35
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks valuable, a few points to discuss.

@@ -46,6 +46,8 @@ module RustTaintTracking implements InputSig<Location, RustDataFlow> {
RustDataFlow::readStep(pred, cs, succ) and
cs.getContent() instanceof ArrayElementContent
)
or
pred.asExpr() = succ.asExpr().(RefExprCfgNode).getExpr()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking my understanding: when you take a reference &foo you get data flow from f to the ReferenceContent of &f and you get taint flow from f to &f without content?

What sorts of cases do we need the contentless taint flow for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is right. I added the taint flow to support this line in the SQL injection test:

let unsafe_query_1 = String::from("SELECT * FROM people WHERE firstname='") + &remote_string + "'";

Here remote_string is tainted, and the extra taint step makes unsafe_query_1 tainted at well. One could argue that the reference itself isn't really tainted, but on the other hand the only thing it can be used for is access tainted data and it seemed like a simple way to unlock some additional flow. Alternatively, we could also extend the handling of + to read ReferenceContent as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition is that having + read the ReferenceContent is more accurate but ... I'm worried this will be a can of worms if we got this way. So I guess we should probably leave it the way it is.

@hvitved do you have an opinion on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modelling store steps as also taint steps has proven bad in the past, so I think it would be better to provide a taint flow summary for + which pops ReferenceContent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the best way to do that for a built-in operator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should revert this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modelling store steps as also taint steps has proven bad in the past

Re. this, we also do that right now for arrays (which was inspired by Ruby). Do we want to remove that as well (later)?

Copy link
Contributor

@hvitved hvitved Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully we only add taint steps for reads out of arrays, and not for stores into arrays?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. Got it, taint steps for read steps are fine, but taint steps for store steps are not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should revert this.

Done 👍

let b = &mut a;
sink(*b);
*b = source(37);
sink(*b); // $ MISSING: hasValueFlow=37
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write_through_borrow was a tough test, but I'm surprised we don't get this one. Do you know what's missing? Is it that for * we have a readStep but this case is storing into it? In Swift I think there was some magic that made this kind of thing work on the left side of assignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it that for * we have a readStep but this case is storing into it?

Yes, I think that's it. We'll need to add a case for assignment statements with a * on the left hand side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, and it should be robust enough to handle stuff like (*foo).bar = source() (or whatever the correct syntax would be).

@@ -61,7 +61,7 @@ async fn test_reqwest() -> Result<(), reqwest::Error> {
sink(remote_string1); // $ MISSING: hasTaintFlow

let remote_string2 = reqwest::blocking::get("http://example.com/").unwrap().text().unwrap(); // $ Alert[rust/summary/taint-sources]
sink(remote_string2); // $ MISSING: hasTaintFlow
sink(remote_string2); // $ hasTaintFlow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic!

x = 2; // $ write_access=x
print_i64_ref(&x); // $ access=x
print_i64_ref(&x); // $ read_access=x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if print_i64_ref took a mutable reference and wrote to it, would we still label it a read_access?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's sort of inaccurate. But I think that in order for read steps from foo to &foo to work the SSA library needs to treat &foo as a read. At least, from what I can see, it seems like the simplest and most straightforward way to handle & and *.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'd like to hear @hvitved 's opinion on this point as well, I'm not really sure what other languages do for this and why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my earlier comment.

@@ -341,14 +341,14 @@ fn add_assign() {
let mut a = 0; // a
a += 1; // $ access=a
print_i64(a); // $ read_access=a
(&mut a).add_assign(10); // $ access=a
(&mut a).add_assign(10); // $ read_access=a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is writing to a. Same for a few of the other cases.

rust/ql/test/query-tests/security/CWE-089/sqlx.rs Outdated Show resolved Hide resolved
@geoffw0
Copy link
Contributor

geoffw0 commented Dec 16, 2024

Since I'm away for Christmas I'd better say: I'll be happy for this to be merged once the open conversations have been concluded (and I consider "we'll deal with this later" an acceptable conclusion). Don't wait for my final 👍 if someone else wants to approve.

@@ -712,6 +712,11 @@ private class CapturedVariableContent extends Content, TCapturedVariableContent
override string toString() { result = "captured " + v }
}

/** A value refered to by a reference. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

referred

@@ -46,6 +46,8 @@ module RustTaintTracking implements InputSig<Location, RustDataFlow> {
RustDataFlow::readStep(pred, cs, succ) and
cs.getContent() instanceof ArrayElementContent
)
or
pred.asExpr() = succ.asExpr().(RefExprCfgNode).getExpr()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modelling store steps as also taint steps has proven bad in the past, so I think it would be better to provide a taint flow summary for + which pops ReferenceContent.

@@ -484,7 +484,6 @@ module Impl {
class VariableReadAccess extends VariableAccess {
VariableReadAccess() {
not this instanceof VariableWriteAccess and
not this = any(RefExpr re).getExpr() and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be better to only consider these reads for the SSA library. Should be enough to change certain = false to certain = true here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I had to also handle RefExpr in variableReadActual.

x = 2; // $ write_access=x
print_i64_ref(&x); // $ access=x
print_i64_ref(&x); // $ read_access=x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my earlier comment.

pack: codeql/rust-all
extensible: summaryModel
data:
- ["repo:https://github.com/seanmonstar/reqwest:reqwest", "<crate::blocking::response::Response>::text", "Argument[self]", "ReturnValue", "taint", "manual"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it should be ReturnValue.Variant[crate::result::Result::Ok(0)].

- ["lang:core", "<crate::option::Option>::unwrap", "Argument[self].Variant[crate::option::Option::Some(0)]", "ReturnValue", "value", "manual"]
- ["lang:core", "<crate::option::Option>::unwrap", "Argument[self]", "ReturnValue", "taint", "manual"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these taint models should not be needed after altering the summary above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved one of them. But some of our sources specify taint on the entire Result, so I think I'd be fine to keep the others until that is no longer the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather that we remove these lines, and not have flow for now, we should soon be able to have it once #18298 lands. Otherwise I fear we forget to remove these lines.

@paldepind paldepind force-pushed the rust-data-flow-models branch from dfa3b82 to dc68260 Compare December 17, 2024 15:20
@paldepind paldepind force-pushed the rust-data-flow-models branch from dc68260 to c1e2197 Compare December 17, 2024 16:25
@paldepind
Copy link
Contributor Author

Except from the RefExprCfgNode taint step/+ modeling, all the comments should now be addressed :)

@paldepind paldepind requested a review from hvitved December 18, 2024 07:55
@@ -46,6 +46,8 @@ module RustTaintTracking implements InputSig<Location, RustDataFlow> {
RustDataFlow::readStep(pred, cs, succ) and
cs.getContent() instanceof ArrayElementContent
)
or
pred.asExpr() = succ.asExpr().(RefExprCfgNode).getExpr()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should revert this.

- ["lang:core", "<crate::option::Option>::unwrap", "Argument[self].Variant[crate::option::Option::Some(0)]", "ReturnValue", "value", "manual"]
- ["lang:core", "<crate::option::Option>::unwrap", "Argument[self]", "ReturnValue", "taint", "manual"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather that we remove these lines, and not have flow for now, we should soon be able to have it once #18298 lands. Otherwise I fear we forget to remove these lines.

@paldepind paldepind merged commit 87b9e60 into github:main Dec 18, 2024
15 checks passed
@paldepind paldepind deleted the rust-data-flow-models branch December 18, 2024 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants