0 data mining techniques in the analysis of massive sets usama fayyad ph d chief officer & sr vp research and strategic solutions yahoo inc 1 overview this talk introduction case studies science work done while at nasa jpl jet propulsion lab pasadena ca usa collaboration with caltech astronomy brown university planetary geology study dealing numbers consumers for building out new sciences internet if we have time unlikely actual from targeting applications 2 what is finding interesting structure refers to statistical patterns predictive models hidden relationships examples tasks addressed by modeling classification regression segmentation clustering affinity summarization relations between fields associations visualization 3 example cataloging sky objects based solution automated recognition over billion 94 accuracy recognizing classify that are least one magnitude fainter than state art tripled yield generate catalogs much richer content on order billions 2x10 7