-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node.js APIs are inconvenient to use with URL strings #48994
Comments
The only part of ESM that consistently has a File URLs are not paths and I don't think they ever should have been treated as such. The absolute nature is verbose, which has poor usability, and putting them in the URL class conflates structural validation of the URL format with correctness of the path string which is inaccurate. There are valid paths on some systems which are not valid URLs. I'm 👎🏻 on this. I feel even supporting |
@GeoffreyBooth In the very example you gave, @aduh95 explained that |
With regard to the second point, I would be happy to add a path string to As for “changing everything,” currently
In Node today it might refer literally to a folder named I’m not proposing making the string parsing ambiguous. We have two options for how to handle disambiguation, and we would document what method we’re choosing:
The first option would be a semver-major change, because of the extreme edge case of a file or folder actually named |
I'm not convinced this option would help. People don't want (or very rarely) to |
The kind of use case I had in mind was something like |
A directory name equal or starting with
For APIs that accept URL instances, we can use
The correct way here is Overall, I think the directions of improvement might be:
|
+1 to that, related: wintercg/fetch#5 |
+1 to that. I also think adding the string version of |
One word: TOCTOU As to the general proposal:
Formulated like that, I hope everyone agrees 1 is strictly superior to 2. |
This seems sensible to me.
2 is already the reality of strings. The two kinds of strings here are easily differentiated and are extremely prevalent in the wild. |
Generally, I don't understand what makes url so good at manipulating filesystem paths.
Yes, having different APIs on Windows and Posix is really a pain (so much we ended up writing a translation layer in Yarn to only ever deal with posix path, even on Windows), but I'm not convinced using a completely different data structure, one that wasn't created for filesystem purposes, is needed to address that. At least in this proposal we're talking about URL strings, rather than URL instances, but unless the
That would make |
Also something of note: I wonder how this would interact with third-party tools that check whether something is absolute or not by checking if the first character is For example, I could see it being a problem on archive unpackers (tar, zip) if they protect against absolute paths by manually checking the first characters. In this scenario, a malicious archive could perhaps contain an absolute file that wouldn't be detected as such, and have write access on any file on the disk. |
I think @bnoordhuis has summarized sufficiently why this is something that we absolutely should not do.
@JakobJingleheimer Could you elaborate on this? What different string formats does Node.js interpret beyond whatever the underlying operating system or file system implements? |
I meant for the web in general:
It seems reasonable for |
This is what I mainly use nowadays, via a wrapper around fs/os that is essentially just Python's pathlib |
@JakobJingleheimer I (still) strongly disagree. JavaScript and the web have a messy history that has often favored convenience over sanity, but we should learn from past mistakes and not add to them. As multiple people have explained, accepting file URLs as strings will lead either to unjustified breaking changes, ambiguity, or race conditions. None of these options seem acceptable, so I personally don't think this there's any chance this feature request will be adopted. |
I think maybe this is the wrong question request. Making an API that would work for both or just focused on URLs (and thus able to work on all protocols) seems reasonable to suggest from my perspective, this avoids a lot of collision complexity and backwards compatibility. Additionally the behavior of let u = new URL('./symdir', import.meta.url)
fs.readdirSync(new URL('./..', u)).length // 58, doesn't even see symlink due to /../ normalization
fs.readdirSync(path.join(url.fileURLToPath(u), '..')).length // 53 goes through symlink |
Firstly it's important not to let pragmatism be trumped by technicalities. Build tools can detect usage of There is a big benefit to aligning on a contextual asset story that is standards based. These edge cases are completely valid as a technical concern, but I don't think most users would appreciate the difference between If the major concern is one of compatibility and breaking edge cases, perhaps there's a flag to disable the behaviour? A standards first asset solution would be the best line to consider in my opinion though. And if not this, then we should pick up a spec like https://github.com/tc39/proposal-asset-references and drive it towards the use case. I think this issue describes the simplest story we'll get though. |
I think the |
@guybedford I disagree that the |
The Deno docs for const worker = new Worker(import.meta.resolve("./worker.ts")); Our |
I'd note that |
I realize that this might be an unpopular opinion in the JavaScript space, but I firmly believe that correctness and safety outweigh convenience.
That's the problem. You have to explain it. Users must be aware of it. This does not make things simpler. It increases complexity. Things are simple right now: it's a URL if and only if it's If my understanding of your proposal is accurate, it leads to problematic inconsistencies. // Throws an error.
fs.mkdirSync(new URL("http://example.com/bar"), { recursive: true })
// Creates a directory named "http:" relative to the working directory.
fs.mkdirSync("http://example.com/bar", { recursive: true })
// Creates a directory named "example.com" in the root directory.
fs.mkdirSync("file://example.com/bar", { recursive: true })
Yes, of course they do, because Deno implements Web Workers, whose first argument is a URL string. Node.js does not implement Web Workers. See #43583. |
No, I really don’t. I’m writing this issue as a user, not as a developer. As a user, I want to be able to use URL strings with Node APIs. That’s it, that’s the feature request. We don’t expect users to come up with implementations and solve all edge cases. What I expect and would appreciate from the others commenting on this thread is suggestions for how to achieve my feature request. If the ideas I brainstormed don’t work, fine, propose others. It’s not my responsibility to come up with successive ideas and try to defend them. You all are very smart people and I’m sure you can contribute ideas on how to make my life as a user easier. #48994 (comment) is a great example of what I consider to be a constructive, collaborative comment. |
The only actionable request I can find in this thread is the one given by Guy:
So maybe we should rename this issue to "Node.js @GeoffreyBooth You claim it's a user request, but you are also suggesting a technical solution (support for URL strings in |
There were several good ideas in #48994 (comment) too. I’m happy to rewrite the initial post but I don’t know what I would change it to. Basically I want to make working with URL strings easier, and that was an example of one place where I could see us making a change to do so. If we can narrow down a list of other places, or other actionable steps, I’ll update the top post with wherever this discussion goes. As for |
Yes sorry, I didn't meant to disregard Livia's ideas, which are all relevant btw. But Lydia's comment is listing possible solutions, not defining the problem, and that's what I meant by "actionable requests".
Maybe we can, what we care about is to preserve cross-compat, but returning a |
I can support a new API I guess the broader “defining the problem” is that I want to work with URLs more easily in ESM in Node. Yes another approach besides getting Node APIs to accept URL strings is to more easily get URL instances, though we’re back to adding on |
I probably do not have extra valuable points here, as many already have pointed out why, but I genuinely (as a user) do not think this should land (at least the way it was initially asked). I understand that users request features, and as a user, I'm genuinely interested in knowing how the maintainers and developers of X can support me with said feature. Having that said, that doesn't mean developers should implement X because said users want. I genuinely believe in the convenience of having simple URL strings, I genuinely like that, and as a user, I love that. As a developer, I wonder what that said convenience implies regarding inconsistencies, security issues, and underlying complexities. Some examples in this issue have proven that the current ask might not be ideal. I'd like to know if, instead of rectifying the same ask, the original feature request could be redesigned in a way that suits scenarios that we want without breaking existing code and creating numerous issues. This sort of breaking-change can, and will, break numerous production codes that expect said behavior to be... well.. said behavior. It reminds me (even if it's an entirely separate scenario) of the infamous discussions regarding "undefined and null and the fact that one of those should be removed". I do believe this feature request has a future; I sympathize with the general idea, but I believe there might be different ways of accomplishing similar solutions without changing well-known APIs (and a lot of APIs). For example, |
To clarify why the said suggestions are in huge offset from the initial proposal of this issue (i.e. oriented on URL instances rather than strings, and not touching To address the issue of urlstrings directly, in pure theory, I think there are only two ways to tell if "X is URL":
The second is theoretically feasible in one of two ways:
This approach would still allow us to use "relative URLs", we just have to resolve the input string like this: With this, it is feasible to make a pretty consistent userland wrapper around As for urlstrings themselves, I don't think it's a big problem in returning a string that is guaranteed to be a URL (e.g. what
As for |
Perhaps an alternative would be, like we have |
I was just thinking, if we can’t keep overloading the first parameter to the |
Here's very dirty proof of concept, proxying import { readFile } from 'fsURL';
// these are all the same
await readFile('/etc/fstab'); // relative url that starts from file:///
await readFile('../../../../../../../../../etc/fstab'); // relative url that works with subdirectory depth <= 9
await readFile('file:///etc/fstab'); // absolute url
await readFile(new URL('file:///etc/fstab')); // URL instance
await readFile(Buffer.from('file:///etc/fstab')); // Buffer instance
// these point to test file assuming cwd to be one level higher
await readFile(import.meta.url); // absolute url of this file
await readFile('fsURL/test.mjs'); // relative url that starts from cwd
await readFile('./fsURL/test.mjs'); // relative url that explicitly starts from cwd For the reasons described above, I don't think we should have this in Node.js core. |
As a side note, support for @LiviaMedeiros folks are really using |
I doubt that maintaining a third (or fourth if we separate sync and async APIs) |
I rather think the idea from @mcollina above might be simpler than trying to make filesystem specific |
Supporting But still having support for an |
I have actually been thinking it would be nice if we had a new fs API more based on web standards and with, as you suggest, the ability to map to different targets like in-memory representations, remote content, over an archive, etc. We could have various APIs which return a FileSystemDirectoryHandle and let you interact with content from any sort of source or target. |
What happens if you have a directory in the cwd named I think it's best for security and intelligibility if either there's a
|
This was discussed above: we just define in the docs that path strings beginning with All that said, overloading APIs that accept path strings to also accept URL strings is just one potential solution, and some of the other ideas like
Footnotes
|
TIL! Sorry for the non sequitur suggestion ;)
I think this is the sort of workaround that's going to be a lot more fraught than it seems at first. Like, "breaking change" can mean "this will blow up or otherwise obviously not work unless you change your code to accommodate it", but in this case, it's more like "this will function normally, but potentially do completely the wrong thing". If you have code that does something like: for (const f of await readdir('.')) {
doSomethingWithFile(f)
} Then it's going to potentially be a juicy security target if I get that code to run after managing to create
If the goal is just making it easier to have something like Or, honestly, just telling people "wrap Perhaps it could also be worthwhile to add a const pathOrFileURL = (input: string | URL): URL => {
if (input instanceof URL) {
if (input.protocol !== 'file:') throw new Error('not a file URL');
return input;
}
return input.startsWith('file:') ? new URL(input) : pathToFileURL(input);
} |
There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment. For more information on how the project manages feature requests, please consult the feature request management document. |
There has been no activity on this feature request and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment. For more information on how the project manages feature requests, please consult the feature request management document. |
Problem
Spinning off from #48740 (comment), many of our APIs such as
fs.readFile
accept URL instances (what you get fromnew URL
) but not URL strings (likeimport.meta.url
, or the return value of the soon-to-be-unflaggedimport.meta.resolve
). In the case of many (all?) of these, all strings are interpreted as paths. This is frustrating since in ESM, we have easy access to URL strings such asimport.meta.url
but path strings require using helpers such asfileURLToPath
.Original Idea
Wherever feasible, all Node.js APIs that can accept URL strings should do so. In particular this is most relevant to the
fs
APIs, especially the ones that already accept URL instances. To avoid ambiguity with path strings, such APIs should only interpret URL strings that begin withfile:
.I presume that this would be a semver-major change, to avoid needing to first check for the existence of a file or folder named
file:
in the local path; or perhaps we could add such a check now in order to land this and backport it, and remove such a check in a semver major.We would also need to consider the security implications. Per @aduh95:
I don’t really see how this is a security concern, but I concede that there might be issues to consider. Perhaps some can be addressed via permissions or policies. I do feel however that since URL strings are so prevalent in ESM, we should require a high bar for security concerns to outweigh usability for this feature.
cc @nodejs/loaders @nodejs/modules @nodejs/security
The text was updated successfully, but these errors were encountered: