Security Of Cloud Shared Links
-
@Dashrender said in Security Of Cloud Shared Links:
@BRRABill said in Security Of Cloud Shared Links:
@StrongBad said
Spidering is defined as the following of links. If it is unlinked, by definition, a spider cannot find it.
That is what @scottalanmiller told me. (I think, don't want to put words in his mouth.)
If it's not linked, it can't be found except by brute force.
OK I guess I used the wrong term... Google definitely knows about new pages where links to that site don't exist yet, or much - and it brute forces those sites... and it is undoubtedly brute forcing major websites looking for new pages, not waiting for links to those to appear first.
@BRRABill and I were discussing this and this can't be possible. That would be illegal, in fact, as it would qualify as hacking. And it is technically impossible. Google and everyone else only follows published links.
-
To be sure, if you have a folder that is published or a generic name like "public" and it is listable, then you are self publishing those links through HTTP discovery, obviously. But that's publishing.
-
@StrongBad said in Security Of Cloud Shared Links:
@Dashrender said in Security Of Cloud Shared Links:
Tons of pages aren't linked anyone on any page, yet Google is aware of them because their spiders crawl all over the page doing ls commands looking for anything and everything.
ls commands? How would they do that? There isn't any ls command in HTTP.
again, you're probably right, it's not ls - but there is a way to crawl over a site via HTTP - I had software 15 years ago that I just pointed toward a URL and it would find all of the folder structure that it was allowed to get to, many not having links.
-
Now we're getting to debate my question!
I originally thought the same as @Dashrender, whichis why I was concerned that the link would eventually be found.
But as you've seen, @scottalanmiller says that is impossible.
-
@scottalanmiller said in Security Of Cloud Shared Links:
To be sure, if you have a folder that is published or a generic name like "public" and it is listable, then you are self publishing those links through HTTP discovery, obviously. But that's publishing.
This is what I'm talking about.
-
@Dashrender said in Security Of Cloud Shared Links:
@StrongBad said in Security Of Cloud Shared Links:
@Dashrender said in Security Of Cloud Shared Links:
Tons of pages aren't linked anyone on any page, yet Google is aware of them because their spiders crawl all over the page doing ls commands looking for anything and everything.
ls commands? How would they do that? There isn't any ls command in HTTP.
again, you're probably right, it's not ls - but there is a way to crawl over a site via HTTP - I had software 15 years ago that I just pointed toward a URL and it would find all of the folder structure that it was allowed to get to, many not having links.
The folders present links via HTTP. Those are linked. Nothing nefarious or weird there, that's a published directory structure. You can do it by hand and see the links very clearly.
-
@Dashrender said in Security Of Cloud Shared Links:
@scottalanmiller said in Security Of Cloud Shared Links:
To be sure, if you have a folder that is published or a generic name like "public" and it is listable, then you are self publishing those links through HTTP discovery, obviously. But that's publishing.
This is what I'm talking about.
Right, so if there are links, Google can see them. Disable the display of the links, and Google cannot.
-
@BRRABill said in Security Of Cloud Shared Links:
Now we're getting to debate my question!
I originally thought the same as @Dashrender, whichis why I was concerned that the link would eventually be found.
But as you've seen, @scottalanmiller says that is impossible.
But he doesn't - he and I are talking about the same thing - things that you self publish to HTTP are there and are findable without links from some place else.
-
@Dashrender said in Security Of Cloud Shared Links:
@BRRABill said in Security Of Cloud Shared Links:
Now we're getting to debate my question!
I originally thought the same as @Dashrender, whichis why I was concerned that the link would eventually be found.
But as you've seen, @scottalanmiller says that is impossible.
But he doesn't - he and I are talking about the same thing - things that you self publish to HTTP are there and are findable without links from some place else.
No one said links from somewhere else. You are linking yourself to every file in the example that you are providing.
-
@scottalanmiller said in Security Of Cloud Shared Links:
@Dashrender said in Security Of Cloud Shared Links:
@scottalanmiller said in Security Of Cloud Shared Links:
To be sure, if you have a folder that is published or a generic name like "public" and it is listable, then you are self publishing those links through HTTP discovery, obviously. But that's publishing.
This is what I'm talking about.
Right, so if there are links, Google can see them. Disable the display of the links, and Google cannot.
I'm not entirely sure what you mean by display of links - can you be more specific in an explanation?
-
@scottalanmiller said in Security Of Cloud Shared Links:
@Dashrender said in Security Of Cloud Shared Links:
@BRRABill said in Security Of Cloud Shared Links:
Now we're getting to debate my question!
I originally thought the same as @Dashrender, whichis why I was concerned that the link would eventually be found.
But as you've seen, @scottalanmiller says that is impossible.
But he doesn't - he and I are talking about the same thing - things that you self publish to HTTP are there and are findable without links from some place else.
No one said links from somewhere else. You are linking yourself to every file in the example that you are providing.
So you're saying that every file in the www root directory on an IIS server is considered self published or more specifically.. self linked? Even if there is no link from any htm page that is on the site?
-
@Dashrender said in Security Of Cloud Shared Links:
@scottalanmiller said in Security Of Cloud Shared Links:
@Dashrender said in Security Of Cloud Shared Links:
@scottalanmiller said in Security Of Cloud Shared Links:
To be sure, if you have a folder that is published or a generic name like "public" and it is listable, then you are self publishing those links through HTTP discovery, obviously. But that's publishing.
This is what I'm talking about.
Right, so if there are links, Google can see them. Disable the display of the links, and Google cannot.
I'm not entirely sure what you mean by display of links - can you be more specific in an explanation?
We rarely see this today because no one does this, we use applications rather than straight files, but let's say you have a directory of HTML files under my.site.com/files/
You can set the web server to automatically generate a page as the default for that folder that displays each file in that folder as a link. This is not part of the web or of HTTP, but is a function that can be enabled in some web servers (but not all.) It's a "auto linking" feature that people often want. But the web server does this explicitly and makes a link to each resources creating everything that spiders need to see all files, including link to the next directory listing.
-
@Dashrender said in Security Of Cloud Shared Links:
@scottalanmiller said in Security Of Cloud Shared Links:
@Dashrender said in Security Of Cloud Shared Links:
@BRRABill said in Security Of Cloud Shared Links:
Now we're getting to debate my question!
I originally thought the same as @Dashrender, whichis why I was concerned that the link would eventually be found.
But as you've seen, @scottalanmiller says that is impossible.
But he doesn't - he and I are talking about the same thing - things that you self publish to HTTP are there and are findable without links from some place else.
No one said links from somewhere else. You are linking yourself to every file in the example that you are providing.
So you're saying that every file in the www root directory on an IIS server is considered self published or more specifically.. self linked? Even if there is no link from any htm page that is on the site?
No, but you don't have that happening in your example. In your example, you are creating links to those resources. Most people do, as they want them spidered.
-
Even index.html isn't an exception to this, that is autolinked as a setting in the web server, too. Many of the links are by convention.
-
Example...
http://mirror.centos.org/centos/
That link doesn't go to a file, it goes to a directory. but what does your browser display? An HTML page automatically generated by the web server that links to all files and folders stored in that location. Follow the links to automatically generated pages with more links until you get to the file that you want. A spider following this is literally following links generated by an application making HTML pages behind the scenes to display the links to the end users (or spiders.) This is not intrinsic but is a "by convention" method of displaying static HTML folders and files and is super common to not use (the web server that NodeBB uses doesn't even have this functionality.)
If you don't automatically make those links one way or another, the spider has nothing to follow.
-
Things that people often miss that make links that they don't know about are...
- Automatic links generated for default landing by the web server
- Automatic links generated for folder listing by the web server
- Sitemaps generated by the web server or application
- Robot directives directing spiders to specific resources
- RSS or Atom feeds of files
- Automatic linking by applications
-
@scottalanmiller said in Security Of Cloud Shared Links:
Even index.html isn't an exception to this, that is autolinked as a setting in the web server, too. Many of the links are by convention.
auto linked to what?
-
@Dashrender said in Security Of Cloud Shared Links:
@scottalanmiller said in Security Of Cloud Shared Links:
Even index.html isn't an exception to this, that is autolinked as a setting in the web server, too. Many of the links are by convention.
auto linked to what?
To whatever the default is set in the web server. Go to http://ntg.co/ (really, go to it, we need the hits) and the web server is informed to go "serve up the default link." In the case of that particular site, the default is set to index.php. You set this for all web servers. If you don't set it for Apache, it has a default setting of index.html and IIS has a built in default of index.htm.
-
@scottalanmiller said in Security Of Cloud Shared Links:
Example...
http://mirror.centos.org/centos/
That link doesn't go to a file, it goes to a directory. but what does your browser display? An HTML page automatically generated by the web server that links to all files and folders stored in that location. Follow the links to automatically generated pages with more links until you get to the file that you want. A spider following this is literally following links generated by an application making HTML pages behind the scenes to display the links to the end users (or spiders.) This is not intrinsic but is a "by convention" method of displaying static HTML folders and files and is super common to not use (the web server that NodeBB uses doesn't even have this functionality.)
If you don't automatically make those links one way or another, the spider has nothing to follow.
OK that makes sense. So if, in the case of NodeBB, it doesn't have the functionality, but there a folder called /pictures123 that has pictures in it and there are no links to it, and not auto generation - yet if you know the exact URL, you're saying Google can't find that folder? and no legal entity can?
-
Right... the HTTP command set has no listing capability at all. That's not one of the HTTP commands. All directory listings or links of any sort have to be either included in a static file or created by the web server or something that talks to the web server (like a PHP site.)