Larry Smarr | Director of the California Institute for Telecommunications and Information Technology (Calit2)
Abstract: The human body is host to 100 trillion microorganisms, ten times the number of cells in the human body and these microbes contain 300 times the number of DNA genes that our human DNA does. The microbial component of our “superorganism” is comprised of hundreds of species with immense biodiversity. Thanks to the National Institutes of Health’s Human Microbiome Program researchers have been discovering the states of the human microbiome in health and disease. To put a more personal face on the “patient of the future,” I have been collecting massive amounts of data from my own body over the last ten years, which reveals detailed examples of the episodic evolution of this coupled immune-microbial system. An elaborate software pipeline, running on high performance computers, reveals the details of the microbial ecology and its genetic components. A variety of data science techniques are used to pull biomedical insights from this large data set. We can look forward to revolutionary changes in medical practice over the next decade.
Bio: Larry Smarr is the founding Director of the California Institute for Telecommunications and Information Technology (Calit2), a UC San Diego/UC Irvine partnership, and holds the Harry E. Gruber professorship in the Department of Computer Science and Engineering (CSE) of UCSD’s Jacobs School of Engineering. Before that he was the founding director of the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Champaign-Urbana. He is a member of the National Academy of Engineering, as well as a Fellow of the American Physical Society and the American Academy of Arts and Sciences. In 2006 he received the IEEE Computer Society Tsutomu Kanai Award for his lifetime achievements in distributed computing systems and in 2014 the Golden Goose Award. He served on the NASA Advisory Council to 4 NASA Administrators, was chair of the NASA Information Technology Infrastructure Committee and the NSF Advisory Committee on Cyberinfrastructure, a member of the DOE Advanced Scientific Computing Advisory Committee and ESnet Policy Board, and for 8 years he was a member of the NIH Advisory Committee to the NIH Director, serving 3 directors. His personal interests include growing orchids, snorkeling coral reefs, and quantifying the state of his body.
Pedro Alves | Director of Data Science at Sentient Technologies
Practical Data Science
Abstract: This talk will cover practical points in data science that come up in industry as well as some real-world examples of projects and problems. The problems discussed in this talk will focus on aspects of projects that are usually not taught in data science courses but are where a considerable portion of work is done. This talk will also briefly cover the history and theory behind ensembles and why they work so well.
Biography: Pedro has experience in predicting, analyzing and visualizing data in the fields of: genomics, gene networks, cancer metastasis, insurance fraud/costs, hospital readmissions, soccer strategies, joint injuries, social graphs, human attraction, spam detection and topic modeling, among others. Pedro is incredibly passionate about all aspects of data science and is constantly creating new techniques and algorithms to suit the problems at hand. At Banjo, his efforts were geared towards detecting and interpreting everything that is happening in the world in real-time, from major concerts and sporting events to major and minor news. Now he leads the data science efforts at Sentient.ai where they use evolutionary algorithms and massively scaled deep learning to solve problems such as trading and visual comprehension of consumer products.
Woodburn Hall 200
Mauro Martino | Cognitive Visualization Lab at IBM Watson Cambridge, MA
Point, Line & Data: New methods for understanding complex data, from storytelling to machine learning
Abstract: The aesthetics of science is changing, the diffusion of data visualization tools is enabling a revival of beauty in scientific research. More and more papers are presented with seductive images, convincing videos, and sharp interactive tools. Scientific storytelling will be discussed with 2 case studies: “Charting Culture, 2014”, and “Rise of partisanship, 2015”. In the second part of the talk we explore the connection between Machine Learning & Data Visualization. We will see together 3 projects: News Explorer – exploration of real-time news, Ted Watson – exploration of a large corpus of videos, and Watson 500 – the analysis of relationships between entities and topics in a specific corpus of date. We encourage the public to use these tools before the talk:
Biography: Mauro Martino is an Italian expert in data visualization based in Boston. He created and leads the Cognitive Visualization Lab at IBM Watson in Cambridge, Massachusetts, USA. Martino’s data visualizations have been published in the scientific journals Nature, Science, and the Proceedings of the National Academy of Sciences. His projects have been shown at international festivals including Ars Electronica, and art galleries including the Serpentine Gallery, UK, GAFTA, USA, and the Lincoln Center, USA.
Jointly organized by the Data Science program and the Cyberinfrastructure for Network Science Center, this talk is partially supported by Indiana University’s Consortium for the Study of Religion, Ethics and Society, a consortium sponsored by the Vice President for Research Office
Talk details can be found at http://cns.iu.edu/cnstalks. All talks will take place in the new Social Science Research Commons, Woodburn Hall 200 (unless otherwise noted).
Abe Weston | Associate Principal performing in Fraud Analytics
Predictive Modeling of Pay-Per-Click Keywords Bid Value
Abstract: Pay-Per-Click (PPC) keywords are purchased by advertisers, preferably at minimum cost, in order to maximize profit. Data sets with predictors are generated using data collected from the Google Search API for Shopping and the Microsoft Ad Intelligence Service. Machine learning is applied to successfully predict the future bid value.
Biography: Abe Weston works for Google as an Associate Principal performing in fraud analytics. He has an undergraduate degree in Industrial Technology, MS degrees in Telecommunications Systems and Data Mining, and a graduate certificate in Statistics. He enjoys traveling, camping, reading, hiking, biking, yoga, playing guitar, and spending time with his wife and three kids.
Woodburn Hall 200
Kalev Leetaru | Senior Fellow at the George Washington University Center for Cyber & Homeland Security
Quantifying, Visualizing, and Forecasting Global Human Society Through “Big Data”: What it Looks Like To Compute on the Entire Planet
Abstract: Put simply, the GDELT Project is a realtime index over global human society, inventorying the world’s events, emotions, and narratives as they happen. GDELT live machine translates the world’s information across 65 languages and identifies the planet’s events, counts, quotes, people, organizations, locations, millions of themes and thousands of emotions, imagery, video, and social posts, creating a massive realtime global graph. Here’s what it looks like to conduct data analytics at a truly planetary scale.
Biography: One of Foreign Policy Magazine’s Top 100 Global Thinkers of 2013, Kalev Leetaru is a Senior Fellow at the George Washington University Center for Cyber & Homeland Security and a member of its Counterterrorism and Intelligence Task Force, as well as being a 2015-2016 Google Developer Expert for Google Cloud Platform. From 2013-2014 he was the Yahoo! Fellow in Residence of International Values, Communications Technology & the Global Internet at Georgetown University’s Edmund A. Walsh School of Foreign Service, where he was also an Adjunct Assistant Professor, as well as a Council Member of the World Economic Forum’s Global Agenda Council on the Future of Government. Featured in the presses of more than 100 nations and from Nature to the New York Times, his work focuses on how innovative applications of the world’s largest datasets, computing platforms, algorithms and mindsets can reimagine the way we understand and interact with our global world. More on his latest projects can be found on his website athttp://www.kalevleetaru.com/ or http://blog.gdeltproject.org.
Jointly organized by the Data Science program and the Cyberinfrastructure for Network Science Center, this talk is partially supported by Indiana University’s Consortium for the Study of Religion, Ethics and Society, a consortium sponsored by the Vice President for Research Office.
Jure Lescovec | Assistant Professor of Computer Science at Stanford University
Machine Learning for Human Decision Making
Abstract: In many real-life settings human judges are making decisions and choosing among many alternatives: Medical doctor deciding a treatment for a patient, criminal court judge making a decision about a defendant, a crowd-worker labeling an image, and a student answering a multiple-choice question. Gaining insights into human decision making is important for determining the quality of individual decisions as well as identifying human mistakes and biases.
In this talk we discuss the question of developing machine learning methodology for estimating the quality of individual judges and obtaining diagnostic insights into how various judges decide on different kinds of items. We develop a series of increasingly powerful hierarchical Bayesian models, which infer latent groups of judges and items with the goal of obtaining insights into the underlying decision process. We apply our framework to a wide range of real-world domains, and demonstrate that our approach can accurately predict judge’s decisions, diagnose types of mistakes judges tend to make, and infer true labels of items.
Bio: Jure Leskovec is assistant professor of Computer Science at Stanford University and chief scientist at Pinterest. Computation over massive data is at the heart of his research and has applications in computer science, social sciences, economics, marketing, and healthcare. This research has won several awards including a Lagrange Prize, Microsoft Research Faculty Fellowship, the Alfred P. Sloan Fellowship, and numerous best paper awards. Leskovec received his bachelor’s degree in computer science from University of Ljubljana, Slovenia, and his PhD in in machine learning from the Carnegie Mellon University and postdoctoral training at Cornell University. You can follow him on Twitter @jure.
Daniel Katz | Program Director in the Division of Advanced Cyberinfrastructure at the National Science Foundation
Scientific Software Challenges and Community Responses
Abstract: As the process of science has become increasingly digital, scientific outputs and products have grown beyond simple papers and books to include software, data, and other electronic components. Scientific knowledge is embedded in these components. And papers and books themselves are also becoming increasingly digital, allowing them to executable and reproducible. As we move towards this future where science is performed in and recorded as a variety of linked digital products, the characteristics and properties that developed for books and papers need to be applied to all digital products. This talk will discuss a number of the related challenges that face the computational and data scientific software community, including how researchers find software, how developers get credit for writing software, and how software can be sustained, many of which have parallels with data. It will also discuss how different members of the community are individually and collectively trying to address these challenges.
Biography: Daniel S. Katz is a Program Director in the Division of Advanced Cyberinfrastructure at the National Science Foundation. His interest is in the development and use of advanced cyberinfrastructure to solve challenging problems at multiple scales. His technical research interests are in applications, algorithms, fault tolerance, and programming in parallel and distributed computing, including HPC, Grid, Cloud, etc. He is also interested in policy issues, including citation and credit mechanisms and practices associated with software and data, organization and community practices for collaboration, and career paths for computing researchers. He received his B.S., M.S., and Ph.D degrees in Electrical Engineering from Northwestern University, Evanston, Illinois, in 1988, 1990, and 1994, respectively.
Michael Conover | Staff Data Scientist at LinkedIn
Building Machine Learning Projects
Abstract: In this talk, we’ll work through some of the challenges faced by LinkedIn’s machine learning research & development teams in building and shipping intelligent systems that make sense of the world’s economy. From creating novel training data and productionizing models with complex structure to evangelizing and evaluating the results of unsupervised algorithms, this talk will examine real-world case studies describing some of the hardest and most interesting challenges faced by one of the world’s largest technology companies.
Biography: Mike Conover builds machine learning technologies that leverage the behavior and relationships of hundreds of millions of people. A staff data scientist at LinkedIn and Indiana University alum, Mike has a Ph.D. in complex systems analysis with a focus on information propagation in large-scale social networks.
Wayne Pan | Chief medical officer for Applied Research Works
Understanding the Business Cases in Healthcare Delivery Systems for Data-driven Discovery
Abstract: The US healthcare system is in the midst of transformative change in response to new payment models which transfer risk from traditional payers to providers, in order to better align incentives to support higher quality, lower cost and improved health outcomes. The most successful organizations will leverage insights from their operational and clinical data, using advanced analytical techniques, which will enable them to exceed in managing risk, providers, and patients. This talk will review business cases for data-driven discovery in the healthcare delivery models of tomorrow.
What data issues will these new organizations be facing? What are the new questions that need to be addressed? How will the use of data from EHRs, HIEs, and patient-derived data from the IoT change the way clinical medicine is practiced? How will data-driven discovery re-define our concept of personalized medicine?
Biography: Wayne is the chief medical officer for Applied Research Works (ARW), based in Palo Alto, California. ARW is a healthcare technology company that has developed a simple, yet effective cloud-based, physician and patient behavioral change platform that is being used in several health plans and medical groups across the US to drive improved performance in healthcare quality, better patient experience and clinical outcomes. He was formerly the Chief Medical Officer at Santa Clara County IPA (SCCIPA), a large multispecialty physician group located in Santa Clara County, California with 800 physicians serving 80,000 patients in commercial (HMO/ACO) and Medicare Advantage (HMO/ACO) programs.
Wayne has over 20 years of broad healthcare industry experience from clinical medicine, to managed care, and health information technology. After 5 years of clinical practice as a fellowship-trained orthopaedic hand surgeon, Wayne served as Chief Medical Officer for several San Francisco Bay Area Medicaid managed care plans, where he started two Medicare Advantage Dual Eligible Special Needs Plans. Wayne also served as Chief Medical Officer of two local IPAs, as a Chief Medical Informatics Officer of a healthcare software company and as an Advisory Chief Medical Officer at data analytics start-up focused on big data issues in healthcare. Wayne completed his undergraduate studies in Biology at Johns Hopkins University, his MD and PhD degrees concurrently at Mt. Sinai School of Medicine and his MBA at the Wharton School of the University of Pennsylvania. He did his post-graduate clinical training in Orthopaedic Surgery at Thomas Jefferson University Hospitals and Clinics and his fellowship in hand and microsurgery at the Philadelphia Hand Center.
Indiana Memorial Union, Oak Room
Dashun Wang | Assistant Professor of Information Sciences and Technology at the Pennsylvania State University
Understanding Success in Science and Technology
Abstract: Our current approach to success is driven by the belief that predicting exceptional impact requires us to detect extraordinary ability. Despite the long-standing interest in the problem, even experts remain notoriously bad at predicting long-term impact. Success reveals predictable patterns, however, if we start to see it not as an individual but a collective phenomenon: for something to be successful, it is not enough to be novel or appealing, but we all must agree that it is worthy of praise. If we accept the collective nature of success, its signatures can be uncovered from the many pieces of data around us using the tools of network and data sciences. In this talk, I will touch on three different examples of success spanning across science and technology, hoping to illustrate a series of fundamental mechanisms governing success. The uncovered patterns in these studies not only document new degrees of regularities underlying the often noisy and unpredictable complex systems, they also offer reliable measures of influence that may hold direct policy implications.
Biography: Dashun Wang is Assistant Professor of Information Sciences and Technology at the Pennsylvania State University and Adjunct Assistant Professor of Physics at Northeastern University. Prior to joining Penn State, he was a Research Staff Member at the IBM T.J. Watson Research Center. Dashun received his PhD in Physics from Northeastern University, where he was a member of the Center for Complex Network Research. From 2009 to 2013, he had also held an affiliation with Dana-Farber Cancer Institute, Harvard University as a Research Associate. He received his B.S. degree in Physics from Fudan University in 2007. Dashun is a recipient of the AFOSR Young investigator award (2016).
Dashun leads a group of highly interdisciplinary researchers who are extremely passionate about data. Through the lens of new and increasingly available large-scale datasets, he aims to use and develop tools of network science to help improve the way in which we understand the interconnectedness of the social technical and business world around us. His research has been published in both general audience journals and top computer science venues, and has been featured in Nature, Science, The Economist, MIT Technology Review, The Boston Globe, ORF, Physics World, among other major global media outlets.
Powered by Events Manager