Good Research Data is hard to find
http://commoncrawl.org/ - They build and maintain an open repository of web crawl data that can be accessed and analysed by anyone.
https://www.data.gov/ - U.S. Governments open data