A thought I had a few days ago about Distributed Denial of Service...
DDoS is usually obtained using a botnet that receives a command to enter a certain website at once to choke its bandwidth. This kind of attack is almost unstoppable since there is usually no way of knowing who are the legitimate users and what page requests came from bots.
But what if someone found a way to run a JS script using XSS on a very big website with tens of thousands of hits per day? What if that script contained a small deferred background JS script that continuously creates simple XMLHTTP requests to a certain page?
Tuesday, April 24, 2007
Monday, April 23, 2007
Cheating on Bandwidth with PHP
Note: I am NOT responsible for anything that could happen if you actually try this (especially if your web hosting service decides to sue you).
PHP is a very fun scripting language. Besides the stanard behaviour that is expected from an honest script to connect to a database, parse the information and display it to the user, PHP can do a few tricks as well. One of them, is to create a listening socket and forking a new process.
If your web hosting allows you to fork and create a listening socket through PHP, you might just be able to do some nasty things so that your visitors will download content from another port on the server, instead of through the Apache web server. Traffic that is being downloaded from your site through Apache gets summed up and limited, usually for a fixed amount per month, depending on your hosting package. If you decide to get more bandwidth, you need to pay more.
But what if you could get your site's visitors to download the content itself from a forked process from PHP that listens to a specific port and acts like a web server? The user will manage to downlaod content from your site, and the bandwidth will not be accounted for.
The algorithm:
PHP is a very fun scripting language. Besides the stanard behaviour that is expected from an honest script to connect to a database, parse the information and display it to the user, PHP can do a few tricks as well. One of them, is to create a listening socket and forking a new process.
If your web hosting allows you to fork and create a listening socket through PHP, you might just be able to do some nasty things so that your visitors will download content from another port on the server, instead of through the Apache web server. Traffic that is being downloaded from your site through Apache gets summed up and limited, usually for a fixed amount per month, depending on your hosting package. If you decide to get more bandwidth, you need to pay more.
But what if you could get your site's visitors to download the content itself from a forked process from PHP that listens to a specific port and acts like a web server? The user will manage to downlaod content from your site, and the bandwidth will not be accounted for.
The algorithm:
- Use mod_rewrite on specific directories to send the requested file name to send through a PHP script
- The PHP script will either create the download link or send a Redirect header in the following manner: http://www.yoursite.com:[random_port], while random_port is a number between 1024 and 65535 (Ports below 1024 are privileged ports).
- At the same time that the link is created, use fork to create a small temporary daemon that will run in the background and wait for a connection.
- The client will attempt to download the file through the chosen port.
- The forked PHP script will parse the HTTP request and send the requested file back to the client. The HTTP request will be a very simple one (something like "GET / HTTP/1.1" and a few more insignificant headers) since we already know exactly which file to send. (One of the parameters to our script was the file name).
- It is possible to leave the forked daemon on, but that would really be nasty :)
Probably the most obvieous reason why this won't work usually is because of PHP security settings that will not allow you to do this hack. Other than that, most servers today have firewalls for incoming connections, especially on unprivileged ports. If your server's hosting is lame enough, you might actually succeed in doing this.
WYSIWYGS Edtiros Suck, Use WYSIWYM Editors
The basic problem today with developing or using Content Management Systems, Blogs, etc. is that we want to allow the content writers to have a flexible editor with features such as bullets, bold font, different font size, etc. On the other hand, we want to have a strict CSS design for our website, so that the stylesheet will determine how the content will be displayed in a unified manner. With WYSIWYM editors, both goals can be acheived, since WYSIWYM editors generate strict and standard XHTML code, which was designed specifically for this purpose.
Anyways, here is an excellent article about why WYSIWYM editors kick ass:
http://www.456bereastreet.com/archive/200612/forget_wysiwyg_editors_use_wysiwym_instead/
Anyways, here is an excellent article about why WYSIWYM editors kick ass:
http://www.456bereastreet.com/archive/200612/forget_wysiwyg_editors_use_wysiwym_instead/
Friday, April 20, 2007
Web 2.0 - Beware!
The new AJAX approach to web design is fun and fascinating, but dangerous at the same time. The main problem with AJAX is that you can't index your site easily. If most of your website content is generated dynamically in the page using AJAX, search engines will NOT be able to index your site content. This issue is supposed to be figured out sometimes, and I'm sure Google is already working on a Javascript / browser simulator to solve this issue out. But Google's solution to dynamic content will never be perfect, because Web 2.0 usually relies on human interaction.
The more concerning issue about Web 2.0 is content stealing - since the basic idea behind AJAX requests is client side data processing (which gives the web much more flexability), the data that is received at the client is plaintext and can usually be parsed in a simple manner (XML or CSV data). The problem is that it becomes very easy to reverse engineer AJAX driven webpages because of the low security implementation. It is much harder to reverse engineer a program and understand how it connects to its remote server, or parse data by yourself from server side web applications. Stealing a webpage written with AJAX can be as simple as copy-pasting functions from the original web page.
So how can these application be protected?
First of all, obfuscation of the data and the code itself. There are program that know how to do it and it might be very helpful to defend against the most common and lamest hackers around. Data obfuscation can be obtained by a simple encryption which is hard to understand and easy to process using Javascript.
The data source itself can be also protected using a referrer check - if the AJAX request came from an unknown page, the service can be blocked. But this can also be easily bypassed by forging the referrer header from the client or from servers that rip the data from the service.
The best technique for protecting AJAX services is using a session - either by using login cookies which the AJAX requests use, or server generated random values that pass back manually from the Javascript itself (the exact same idea, only does not need cookie support and a bit harder to implement). This method is the exact same technique that is used to protect sites from unauthorized users, only that the login sequence is automatic once you enter the main page.
Of course that temporary session cookies are not enough to protect AJAX sites, since another request can be added to extract the session cookie from the main page automatically from the client, which is usually a difficult task to do, exactly as difficult as ripping sites would be, which is exactly what we wanted to achieve.
The more concerning issue about Web 2.0 is content stealing - since the basic idea behind AJAX requests is client side data processing (which gives the web much more flexability), the data that is received at the client is plaintext and can usually be parsed in a simple manner (XML or CSV data). The problem is that it becomes very easy to reverse engineer AJAX driven webpages because of the low security implementation. It is much harder to reverse engineer a program and understand how it connects to its remote server, or parse data by yourself from server side web applications. Stealing a webpage written with AJAX can be as simple as copy-pasting functions from the original web page.
So how can these application be protected?
First of all, obfuscation of the data and the code itself. There are program that know how to do it and it might be very helpful to defend against the most common and lamest hackers around. Data obfuscation can be obtained by a simple encryption which is hard to understand and easy to process using Javascript.
The data source itself can be also protected using a referrer check - if the AJAX request came from an unknown page, the service can be blocked. But this can also be easily bypassed by forging the referrer header from the client or from servers that rip the data from the service.
The best technique for protecting AJAX services is using a session - either by using login cookies which the AJAX requests use, or server generated random values that pass back manually from the Javascript itself (the exact same idea, only does not need cookie support and a bit harder to implement). This method is the exact same technique that is used to protect sites from unauthorized users, only that the login sequence is automatic once you enter the main page.
Of course that temporary session cookies are not enough to protect AJAX sites, since another request can be added to extract the session cookie from the main page automatically from the client, which is usually a difficult task to do, exactly as difficult as ripping sites would be, which is exactly what we wanted to achieve.
Thursday, April 19, 2007
SEO?!
SEO (usually rhymes with the word Google) is Search Engine Optimization, the technique (or art) of getting search engines to like your site more.
This is not an article about SEO, but more of food for thought about it. A lot of people make very good money out of the internet, and SEO is a big part of it. If you want people to go into your site and buy / click on ads for you, you need people to know your site exists. Now people usually think that the more you pay Google, the higher you climb on Google's search results. Well, the truth is, Google are good people, therefore they decide whose really the best, and not who pays them the most using several techniques - PageRank and search relevance. That's why many people pay a lot of money for people to do SEO on their sites - because they can't pay Google to do the same. The cool thing about SEO is that it can multiply your revenues by a factor of 10, even more sometimes, just because you show up a few places ahead of what you used to.
Google knows what to find because they're just smart. Smart people know how minds think and act, and that's how the magic is done. Google turns web pages into "mathematical equasions" (well, I can't find a better way to put it), and the pages are indexed using keywords from the page, depending on where the keywords in the page are and what they look like. (For example, keywords in titles have a stronger weight than keywords in small text). But, Google also wants to show you the most relevant pages by giving you the pages they think are better, and that's what makes Google the best search engine around.
Google's method of knowing which pages are better than others (and therefore, more relevant to the search) is the Google PageRank. PageRank is a number from 0 to 10 which means how good and informative your site actually is. Google PageRank is generated once in a while for each page using a lot of variables, such as how original the content of the site is, how often it is updated, how many people visit the page, and many many more techniques.
For example, If someone looks for the word "Flower shop", a page which might be interesting for the person who searched for it should contain the phrase "Flower shop". The more often the phrase is in the page, the more relevant it will be. But if the phrase will show up many times in the page in the same area in the text, it won't count. On the other hand, if the phrase will show up distinctively between very different paragraphs, the phrase will have a stronger signature in the page and will become a better search keyword.
The most commonly known and talked about technique to get a better PageRank on a page is by increasing the number of backlinks to it. The more people link back to your page, the more popular it is, meaning that it will get a higher PageRank. If the sites that refer to your site have a high PageRank, their "vote" will count even more, and your PageRank will be affected more. Using this technique to get you a higher PageRank is very easy (but expensive), just pay people with very well known web sites money so they will put a link to your site from theirs. But, if your site is really good and interesting, people with blogs or sites of their own will link back to your site, and you will get a higher PageRank without actually paying anyone to advertise your site.
Beware, If you do bad things, you get PR0'ed (PageRank 0), which means you'll show up at the bottom of the search.
So what is SEO actually? It's nothing but a technique to make search engines like you more. But eventually, SEO can be legal or illegal. Legal SEO is something that everyone can do. Read a few more articles about search engines, and you'll know how to make your site crawl up the Google ladder. Illegal SEO tricks fool search engines or exploit flaws that allow people to give their page higher PageRank or wider relevance with no good reason. Bottom line is, if you pay guys to do you SEO of that kind, you might get PR0'ed because Google can trace this kind of wierd behaviour.
This is not an article about SEO, but more of food for thought about it. A lot of people make very good money out of the internet, and SEO is a big part of it. If you want people to go into your site and buy / click on ads for you, you need people to know your site exists. Now people usually think that the more you pay Google, the higher you climb on Google's search results. Well, the truth is, Google are good people, therefore they decide whose really the best, and not who pays them the most using several techniques - PageRank and search relevance. That's why many people pay a lot of money for people to do SEO on their sites - because they can't pay Google to do the same. The cool thing about SEO is that it can multiply your revenues by a factor of 10, even more sometimes, just because you show up a few places ahead of what you used to.
Google knows what to find because they're just smart. Smart people know how minds think and act, and that's how the magic is done. Google turns web pages into "mathematical equasions" (well, I can't find a better way to put it), and the pages are indexed using keywords from the page, depending on where the keywords in the page are and what they look like. (For example, keywords in titles have a stronger weight than keywords in small text). But, Google also wants to show you the most relevant pages by giving you the pages they think are better, and that's what makes Google the best search engine around.
Google's method of knowing which pages are better than others (and therefore, more relevant to the search) is the Google PageRank. PageRank is a number from 0 to 10 which means how good and informative your site actually is. Google PageRank is generated once in a while for each page using a lot of variables, such as how original the content of the site is, how often it is updated, how many people visit the page, and many many more techniques.
For example, If someone looks for the word "Flower shop", a page which might be interesting for the person who searched for it should contain the phrase "Flower shop". The more often the phrase is in the page, the more relevant it will be. But if the phrase will show up many times in the page in the same area in the text, it won't count. On the other hand, if the phrase will show up distinctively between very different paragraphs, the phrase will have a stronger signature in the page and will become a better search keyword.
The most commonly known and talked about technique to get a better PageRank on a page is by increasing the number of backlinks to it. The more people link back to your page, the more popular it is, meaning that it will get a higher PageRank. If the sites that refer to your site have a high PageRank, their "vote" will count even more, and your PageRank will be affected more. Using this technique to get you a higher PageRank is very easy (but expensive), just pay people with very well known web sites money so they will put a link to your site from theirs. But, if your site is really good and interesting, people with blogs or sites of their own will link back to your site, and you will get a higher PageRank without actually paying anyone to advertise your site.
Beware, If you do bad things, you get PR0'ed (PageRank 0), which means you'll show up at the bottom of the search.
So what is SEO actually? It's nothing but a technique to make search engines like you more. But eventually, SEO can be legal or illegal. Legal SEO is something that everyone can do. Read a few more articles about search engines, and you'll know how to make your site crawl up the Google ladder. Illegal SEO tricks fool search engines or exploit flaws that allow people to give their page higher PageRank or wider relevance with no good reason. Bottom line is, if you pay guys to do you SEO of that kind, you might get PR0'ed because Google can trace this kind of wierd behaviour.
Finding files over the net
Lately I've been really upset with the fact that finding files over the internet is much harder than it should be. There are billions of files out there just waiting to be downloaded, and no one actually knows where they are...
Now you're probably gonna say "That's bull, you can search for the file on Google and find it easily", but that's where you're wrong. You can only find files which are meant to be found. What does that mean? For example, MP3 files are files which often don't want to be found, meaning that certain search engines (Google, for example) can index them easily but won't do it. On the other hand, you may think that search engines find links to files within pages, which is also wrong because it only indexes the text in the page, and not the anchors themselves. That means that you can only find files over the net by finding pages that lead you to files.
But what if you are looking for a file using a file name instead of its description? What about all the files that are posted in forums all over the world? You'd have to look for the file by entering a description or a file name in a search engine, and then you'd have to look for a link in the page to the file you are looking for (oftenly described using a very big "DOWNLOAD" link). This process may sometimes be annoying, frustrating and time-consuming.
So I got tired of the idea of looking for files over the internet myself. For a start, I wrote a Python based web crawler that searches for a file using a search string, and from there it takes each webpage in the result of the search and read all the links from that page. That gave me a quick and much more comfortable way of looking for files over the internet, without all the fuss behind digging all over the net just to find a small file (which I usually know its exact name, which is just more frustrating).
The script was very useful for downloading specific files which aren't often downloaded. For example: Drivers, DLL's, MP3's, old games, firmware, etc. So I decided to make a webpage out of it: http://www.findthatfile.com. I'm caching my results for webpages that I crawl to speed the search up a bit, so certain download pages which generate download links on the fly (such as download.com) won't work. But I don't really care about these sites since my crawler helps people to find files which are buried in the depths of the internet, and not shareware applications which are already indexed on commercial download sites.
And of course, each file has a link to the page that I found the file in because of copyright issues.
Now you're probably gonna say "That's bull, you can search for the file on Google and find it easily", but that's where you're wrong. You can only find files which are meant to be found. What does that mean? For example, MP3 files are files which often don't want to be found, meaning that certain search engines (Google, for example) can index them easily but won't do it. On the other hand, you may think that search engines find links to files within pages, which is also wrong because it only indexes the text in the page, and not the anchors themselves. That means that you can only find files over the net by finding pages that lead you to files.
But what if you are looking for a file using a file name instead of its description? What about all the files that are posted in forums all over the world? You'd have to look for the file by entering a description or a file name in a search engine, and then you'd have to look for a link in the page to the file you are looking for (oftenly described using a very big "DOWNLOAD" link). This process may sometimes be annoying, frustrating and time-consuming.
So I got tired of the idea of looking for files over the internet myself. For a start, I wrote a Python based web crawler that searches for a file using a search string, and from there it takes each webpage in the result of the search and read all the links from that page. That gave me a quick and much more comfortable way of looking for files over the internet, without all the fuss behind digging all over the net just to find a small file (which I usually know its exact name, which is just more frustrating).
The script was very useful for downloading specific files which aren't often downloaded. For example: Drivers, DLL's, MP3's, old games, firmware, etc. So I decided to make a webpage out of it: http://www.findthatfile.com. I'm caching my results for webpages that I crawl to speed the search up a bit, so certain download pages which generate download links on the fly (such as download.com) won't work. But I don't really care about these sites since my crawler helps people to find files which are buried in the depths of the internet, and not shareware applications which are already indexed on commercial download sites.
And of course, each file has a link to the page that I found the file in because of copyright issues.
Subscribe to:
Posts (Atom)