USDA provides a National Nutrient Database for Standard Reference currently in Version 23 (SR23), which has loads of nutritional data on a good selection of food products. On my wishlist would be cross-referencing with UPC bar codes, as well as some additional data that are currently missing, such as glycemic index and separate soluble vs. insoluble fiber values.
I did find one website that has done a pretty job with glycemic index already: nutritiondata.self.com. In addition to pulling in glycemic index data, they've also calculated glycemic load and then built a statistical model to try to infer/impute glycemic load for foods that don't have published glycemic index data. And they also have some other summary statistics that they've developed in-house such as "Fullness Factor" and an overall healthiness rating. Kudos to them, but I wanted to play with these data myself. Also I didn't see a fiber breakdown there.
One data source I found that has both glycemic load and soluble vs. insoluble fiber (at least for some entries) is the Diet History Questionnaire II (DHQ2). The DHQ is available to download as a single giant table. I plan to play with this.
Another option is to link the glycemic index data derived for the DHQ back to SR23. This appears to be possible thanks to a couple of additional tables. First, DHQ provides their glycemic index data in a table referenced against food codes from the USDA Continuing Survey of Food Intakes by Individuals (CSFII). These codes match the food codes used in the USDA Food and Nutrient Database for Dietary Studies (FNDDS). The FNDDS database includes a table that links selected FNDDS/CSFII codes back to one or more SR23 codes. (Since my main interest in glycemic index is to compare glycemic index to sugar composition, and FNDDS only has a value for total sugar not individual components, I'm not too interested in FNDDS by itself.)
I'll be exploring DHQ vs. SR23 (augmented with glycemic indices linked from FNDDS+DHQ) and will report back.