Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Model FastAPI requests #18318

Merged
merged 4 commits into from
Dec 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
category: minorAnalysis
---
* Added modeling of `fastapi.Request` and `starlette.requests.Request` as sources of untrusted input,
and modeling of tainted data flow out of these request objects.
56 changes: 56 additions & 0 deletions python/ql/lib/semmle/python/frameworks/Starlette.qll
Original file line number Diff line number Diff line change
Expand Up @@ -245,4 +245,60 @@

override DataFlow::Node getAPathArgument() { result = this.getParameter(0, "path").asSink() }
}

/**
* Provides models for the `starlette.requests.Request` class
*
* See https://www.starlette.io/requests/.
*/
module Request {
/** Gets a reference to the `starlette.requests.Request` class. */
API::Node classRef() {
result = API::moduleImport("starlette").getMember("requests").getMember("Request")
or
result = API::moduleImport("fastapi").getMember("Request")
}

/**
* A source of instances of `starlette.requests.Request`, extend this class to model new instances.
*
* This can include instantiations of the class, return values from function
* calls, or a special parameter that will be set when functions are called by an external
* library.
*
* Use the predicate `Request::instance()` to get references to instances of `starlette.requests.Request`.
*/
abstract class InstanceSource extends DataFlow::LocalSourceNode { }

/** A direct instantiation of `starlette.requests.Request`. */
private class ClassInstantiation extends InstanceSource {
ClassInstantiation() { this = classRef().getAnInstance().asSource() }
}

/** Gets a reference to an instance of `starlette.requests.Request`. */
private DataFlow::TypeTrackingNode instance(DataFlow::TypeTracker t) {
t.start() and
result instanceof InstanceSource
or
exists(DataFlow::TypeTracker t2 | result = instance(t2).track(t2, t))
}

/** Gets a reference to an instance of `starlette.requests.Request`. */
DataFlow::Node instance() { instance(DataFlow::TypeTracker::end()).flowsTo(result) }

/**
* Taint propagation for `starlette.requests.Request`.
*/
private class InstanceTaintSteps extends InstanceTaintStepsHelper {
InstanceTaintSteps() { this = "starlette.requests.Request" }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding: This would also apply to fastapi.Request but you're just choosing one qualified name to represent the class right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this class is like the old string-based dataflow configurations, so it just needs a unique string to not have cross-talk.


override DataFlow::Node getInstance() { result = instance() }

override string getAttributeName() { result in ["cookies"] }

Check warning

Code scanning / CodeQL

Singleton set literal Warning

Singleton set literal can be replaced by its member.

override string getMethodName() { none() }

override string getAsyncMethodName() { result in ["body", "json", "form", "stream"] }
}
}
}
35 changes: 35 additions & 0 deletions python/ql/test/library-tests/frameworks/fastapi/taint_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,3 +187,38 @@ async def websocket_test(websocket: WebSocket): # $ requestHandler routedParamet

async for data in websocket.iter_json():
ensure_tainted(data) # $ tainted


# --- Request ---

import starlette.requests
from fastapi import Request


assert Request == starlette.requests.Request
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this assert tell the analysis (type tracking?) that these types are equal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's purely for humans reading the test-file to highlight the fact that they are indeed the same class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(we just adjusted a copy of the websocket modeling, which did the same)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking question: If we accidentally dropped either fastapi or starlette from the union in classRef, this test would still pass right? Would we want separate test code paths for the two ways of obtaining the Request type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we accidentally dropped either fastapi or starlette from the union in classRef, this test would still pass right?

yep. that asssert has no impact on the analysis, and could have been a comment.

Would we want separate test code paths for the two ways of obtaining the Request type?

If we wanted 100% test-coverage of our modeling, then yes. My personal opinion is that we're getting into the realm of diminishing returns of doing so, so I didn't bother.


@app.websocket("/req") # $ routeSetup="/req"
async def request_test(request: Request): # $ requestHandler routedParameter=request
ensure_tainted(
request, # $ tainted

await request.body(), # $ tainted

await request.json(), # $ tainted
await request.json()["key"], # $ tainted

# form() returns a FormData (which is a starlette ImmutableMultiDict)
await request.form(), # $ tainted
await request.form()["key"], # $ tainted
await request.form().getlist("key"), # $ MISSING: tainted
await request.form().getlist("key")[0], # $ MISSING: tainted
# data in the form could be an starlette.datastructures.UploadFile
await request.form()["file"].filename, # $ MISSING: tainted
await request.form().getlist("file")[0].filename, # $ MISSING: tainted
Comment on lines +213 to +217
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding: These are missing because we don't track flow from the result of form() through to its own properties?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we don't have taint-steps to attribute reads by default 👍 We would need to model the specific attribute-flow we wanted for multidict and uploaded files, which we haven't done yet.


request.cookies, # $ tainted
request.cookies["key"], # $ tainted
)

async for chunk in request.stream():
ensure_tainted(chunk) # $ tainted
Loading