Kimberly Van Auken | Database Curator, WormBase and Gene Ontology Consortium
Data Curation in the Biomedical Sciences: from Text to Databases to Knowledge Discovery
Abstract: Knowledge discovery in the biomedical sciences depends on accurate, consistent representation of data in knowledgebases. Over the past two decades the biocuration community, including the Model Organism Databases (MODs) and the Gene Ontology Consortium (GOC) (http://geneontology.org/), have been at the forefront of biological knowledge management. The vast and rapidly increasing amount of biomedical data, however, makes biocuration a very challenging task. In my talk, I’ll discuss methods we’ve employed at WormBase (https://wormbase.org) and the GOC to help meet these challenges, with specific emphasis on the use of: 1) text mining tools, such as Textpresso (http://textpresso.org/), to identify suitable papers and evidence sentences for curation, 2) controlled vocabularies and ontologies to model biological data, and 3) data capture, visualization, and analysis tools to engage users and foster knowledge discovery.
Bio: Kimberly Van Auken, Ph.D. is a Database Curator for WormBase, the online database housing the genetics, genomics and biology of Caenorhabditis elegans and other nematodes. She serves as an ontology editor and co-manager of the Annotation Working Group for the Gene Ontology Consortium and is a member of the editorial board for ‘Database: The Journal of Biological Databases and Curation’. She holds a B.S. in Biochemistry from the University of Rochester, Rochester, NY and a Ph.D. in Molecular, Cellular, and Developmental Biology from the University of Colorado, Boulder.
This event will be held in Rm. 102 of Lindley Hall until further notice.
Laura Vetter | Chief Technologist at Kinney Group
Fast Time to Value – Splunk brings Machine Learning to Machine Data
Abstract: Splunk has been quietly developing use cases that offer companies visibility for their Machine Learning Data. Best known for their Security SIEM tool, which is considered a leading product in the security industry, more recently Splunk has introduced products to fit IT Operations Analytics use cases. By weaving machine learning into auto-baselining “normal” from “abnormal” activity throughout the data center, Splunk has been able to provide high-value data science to problems that were previously solved by anecdotal experience or gut instinct. Last year, Splunk launched their Machine Learning Toolkit which allowed Splunkers to leverage machine learning libraries to make reliable analysis and predictions with complex datasets. From linear regression to cluster analysis and outlier detection, Splunk provides the ability to automatically determine the accuracy of the models that are built. Splunk includes a development API where interested data scientists can plug in new ML algorithms and expose them in the toolkit. This talk will give you an overview of Splunk, what’s included in the toolkit, and how Industry is using it. You will also see how customers with no data science experience can deploy the tool to leverage Machine Learning with their data and how to utilize the dev API. In addition we will answer any questions about use cases seen and lessons learned.
Bio: Laura Vetter is the VP of Analytics at Kinney Group, Inc. and she is one of the most influential leaders at the company. After graduating from Indiana University in 1997, Laura built a professional foundation in database and software engineering. Fast-forward to today, Laura is a Splunk Certified Consultant II and holds a CompTIA Security+ certification. She has been a critical driver of Kinney Group’s technical capabilities and has written numerous customer success stories. Her combination of natural intelligence, work ethic, and expertise has earned her tremendous respect with customers, partners, and colleagues alike.
This talk will be held in Lindley Hall Rm. 102.
Eric Ryszkiewicz | Outcomes Analyst, Palo Alto Medical Foundation
Data Science Applications in Healthcare: Improving Access, Performance, and Resource Deployment
Abstract: Large healthcare systems collect a wealth of data related to nearly all contact with patients, yet frequently struggle with the translation into useful knowledge. As organizations shift away from a model of fee-for-service and towards a new paradigm of accountable care, it will be increasingly important that they can a) preserve or improve access for patients, and b) assess performance on scales from the individual physician up to the entire system. Problems involving missing data, forecasts of operations or performance, and the added value of partnering with nearby institutions all present opportunities for the application of novel solutions to some of the most challenging issues in the contemporary healthcare environment. In this talk, Eric will outline current efforts at PAMF to model and forecast access in primary and specialty care, as well as assess the performance of surgeons and hospitalists operating in local hospitals. He will also discuss forecasting performed on behalf of a major solid organ transplant center in attempts to assure regulatory compliance and outperform the competition on standard metrics.
Bio: Eric Ryszkiewicz, MS MPH is an Outcomes Analyst at Palo Alto Medical Foundation (PAMF), and serves as the Predictive Analytics and Data Science lead for PAMF’s Division of Clinical Business Analytics. His academic background has included healthy doses of chemical engineering, epidemiology, and biostatistics, with a strong focus on applied research. Other professional experience over the past 15 years has included roles as varied as cell fermentation process development in the pharmaceutical industry, air quality monitoring for both research and regulatory purposes, and quality management in solid organ transplantation.
This event will be held in Lindley Hall Rm. 102.
Jean-David Ruvini | Research & Development Director
Data Science in e-commerce Domain
Abstract: Trade is believed to have taken place throughout much of recorded human history and had a fundamental impact on the evolution of all societies. With hundreds of millions of buyers and sellers, and hundreds of millions of live listings at any point in time, online marketplaces like eBay provide an amazing playground for data scientists to work with data and conduct experiments at massive scale. However, while we are in the midst of an extraordinary period of computing platform revolution and a renaissance in artificial intelligence, the specific challenges that e-commerce raises in term of data science are not widely understood. In this talk we will give an overview of research being done at eBay to leverage Machine Learning in the e-commerce domain. We will focus in particular on two areas, Machine Translation and Named Entity Recognition, and will show what makes these tasks particularly challenging.
Bio: Jean-David (JD) joined eBay Research Lab in 2007. Prior to this, he worked at Shopping.com Research Lab where he contributed to design and improved Shopping.com classification and attribute extraction technologies. Initially Jean-David spent five years at the Bouygues Research Lab (a French conglomerate with telco, television, construction and water supply subsidiaries) working on machine learning related projects. He obtained his Ph.D. in Computer Science (Intelligent User Interfaces) from University of Montpellier in France in 2000.
This event will be held in Lindley Hall Rm. 102.
Shaun Grannis | Associate Director, Center for Biomedical Informatics, Regenstrief Institute, Inc.
Healthcare Data Analytics in the Age of Big Data: Real World Examples and Opportunities for the Future
Abstract: The pace of healthcare discovery and translational innovation is accelerating due to an explosion of electronic health data. The Indiana Network for Patient Care (INPC) is one of the nation’s longest running and largest health information exchanges (HIEs), which integrates and standardizes disparate clinical data for most Indiana hospitals. Such an unparalleled system is useful for a variety of purposes, including developing new heath data analytics methods and applications, supporting population health, and advancing precision medicine. This presentation will provide an overview of projects supported by the INPC, including specific use cases highlighting health data analytics, population health, and precision medicine.
Bio: Dr. Shaun Grannis, MD MS FAAFP FACMI, is Interim Director of the Regenstrief Center for Biomedical Informatics and Associate Professor of Family Medicine at the Indiana University School of Medicine. Since joining Regenstrief in 2001, his work has focused on developing and testing big data analytic solutions in support of population health and public health informatics. He led one of the nation’s first initiatives to develop, deploy, and evaluate a statewide real-time public health surveillance system in conjunction with the Indiana State Department of Health, which received a national recognition for its effectiveness and sustainability. Dr. Grannis also develops HIE-based machine learning approaches to detecting cases reportable to public health across large regions, and has constructed and evaluated methods for seamlessly delivering just-in-time public health alerts to physicians.
A copy of the slides used in this talk can be found here.
Michael Sutton | Chief Knowledge Officer, Chief Gamification Officer, Funification LLC
EI (not just AI) in Data Sciences
Abstract: One of the major challenges within the data science field is the imbalance of technical expertise, competencies, skills, and knowledge of Data Scientists against professional soft skills. Business Intelligence, Competitive Intelligence, Government Intelligence, Big Data, Data Mining, Artificial Intelligence, etc., all emphasize the cognitive skills of budding data scientists. However, managers, supervisors, team leads, and executives are seeking analysts and programmers in the workplace who understand team dynamics and can develop leadership styles to help overcome the cognitive bias demonstrated in many data lakes. Director-level executives are trying to recruit Data Scientists who can balance Emotional Intelligence (EI) with technical expertise and be open to accept the benefits of coaching and mentoring in long-term professional development. Dr. Sutton will outline a range of pragmatic tools associated with building balanced leaders and team members that are the foundation for applying Design Thinking within Data Science.
- Insight into the value proposition for increased Emotional Intelligence skill sets within the Data Science field
- Potential opportunities for applying Design Thinking in Data Science
- Knowledge nuggets encompassing leadership and teamship behavior within Data Science
Bio: Michael demonstrates his leadership skills in his roles as a Game-Based Learning Innovator, Architect, and Edupreneur. His current applied research and consulting focuses upon architecting and delivering higher education environments using serious games, immersive learning environments, and simulations that leverage sustainable learning experiences and increased learning and performance for:
- Employee/learner engagement, creativity, and innovation
- Design thinking/visual thinking
- Leadership, teamship, followship, and communityship capacity building
- Intrepreneurship/entrepreneurship competencies
- Knowledge mobilization expertise (knowledge acquisition, production, sharing, and diffusion)
This lecture will be held in Student Building Rm. 015.
Philip Beesley | Professor in Architecture at the University of Waterloo, Professor of Digital Design and Architecture & Urbanism at the European Graduate School
Abstract: Philip Beesley of Waterloo Architecture will present recent work by the Living Architecture Systems group that explores a new generation of sentient architectural environments. Working with artists, engineers and scientists, the research collective combines the crafts of lightweight textile structures, dense arrays of distributed computer controls with machine learning, and artificial-life chemistry. New architectural installations within the collaboration feature dense reticulated grottos with breathing, reactive, near-living qualities. Thin layers of hovering filters are tuned for delicate kinetic and chemical responses in the form of expanded physiologies, beckoning and sharing space with viewers.
The presentation will suggest that conception of buildings can move from classical ideas of a static world of closed boundaries toward the expanded physiology and dynamic form of a metabolism. The architecture of historical Humanism encouraged stripped surfaces supporting free human action. The systems that appear within life-giving forests and jungles seem opposite to the rigid, stable enclosures of classically defined building. Instead of valuing resistance and closure, design for thermal exchange could result in new form-languages based on maximum interaction. Architecture could be founded on adaptation and uncertainty where acquiring and shedding heat play in uneven cycles. The kind of diffusive forms seen in reticulated snowflakes and the microscopic manifolds of mitochondria have a common form-language of radical exfoliation. Their increased surface areas can make their reaction-surfaces potent. These kinds of forms offer delicacy, resonance and resilience.
Writ large, these forms speak of involvement with the world. A new city designed to easily handle unstable conditions of shedding heat and cooling and then rapidly warming up and collecting heat again might well look like a hybrid forest where each building is made from dense layers of ivy-like filters and multiple overlapping layers of porous openings. A building system using an expanded range of reticulated screens and canopies is implied, constructed from minutely balanced filtering layers that can amplify and guide convective currents encircling internal spaces.
Bio: Philip Beesley, MRAIC OAA RCA, is a practicing visual artist, architect, and Professor in Architecture at the University of Waterloo and Professor of Digital Design and Architecture & Urbanism at the European Graduate School.
He serves as the Director for the Living Architecture Systems Group, and as Director for Riverside Architectural Press. His Toronto-based practice, Philip Beesley Architect Inc., operates in partnership with the Europe-based practice Pucher Seifert and the Waterloo-based Adaptive Systems Group, and in numerous other collaborations. The studio’s methods incorporate industrial design, digital prototyping, and mechatronics engineering. Beesley frequently collaborates with artists, scientists and engineers. Recent projects include a series of hybrid fabrics developed with Atelier Iris van Herpen, curiosity-based machine learning environments developed with Rob Gorbet and Dana Kulic of the Adaptive Systems Group, and synthetic metabolisms developed with Rachel Armstrong of the University of Newcastle. His most recent collaboration with Iris Van Herpen has translated a shared sensibility for subtle materials, electricity, and chemistry into a collection of highly complex and diverse textile and haute couture collections.
His research focuses on responsive and distributed architectural environments and interactive systems, flexible lightweight structures integrating kinetic functions, microprocessing, sensor and actuator systems, with particular focus on digital fabrication methods and sheet-material derivations. Beesley has authored and edited sixteen books and proceedings, and has appeared on the cover of Artificial Life (MIT), LEONARDO and AD journals. Features include national CBC news, Vogue, WIRED, and a series of TED talks. His work was selected to represent Canada at the 2010 Venice Biennale for Architecture, and has received distinctions including the Prix de Rome, VIDA 11.0, FEIDAD, Azure AZ, and Architizer A+.
Dan Putler | Chief Scientist at Alteryx
Projecting Election Polling Results to Small Geographic Areas
Abstract: Election polling generally provides a useful means of predicting voter behavior nationally and at the state level. However, due to limited sample sizes and the cost associated with polling, understanding and predicting voter behavior at a “hyper local” level is not generally done – but this information would be invaluable for better focusing grassroots campaign activities such as door-to-door canvasing, get out the vote efforts, and yard signs and other outdoor media. In this work we set out to develop a model to make election predictions at a very localized level. Our solution to the problem accomplishes this in two parts: the first is the creation of a predictive model of voter choice based on both local area factors (such as county level Partisan Voting Index values) as well as individual demographic/socioeconomic characteristics; and the second is estimating the number of individuals that fall into specific demographic/socioeconomic profiles within each small geographic unit.
Our final model predicts voting outcomes at the census tract level (which typically have populations of around 4000 people), the results of which will be available for the public to explore in an interactive web app on the Alteryx Gallery. In this talk we will be looking at the approach and methodologies used to develop these predictions around the upcoming election, which were implemented using a combination of R and Alteryx. In addition, if time permits, an assessment of the predictive capability of the approach relative to the actual November 8 election results at the county level (the lowest level of geography for which election returns are generally available) will be presented.
Bio: Dr. Dan Putler is the Chief Scientist at Alteryx, where he is responsible for developing and implementing the product road map for predictive analytics. He has over 30 years of experience in developing predictive analytics models for companies and organizations that cover a large number of industry verticals, ranging from the performing arts to B2B financial services. He is co-author of the book, “Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R”, which is published by Chapman and Hall/CRC Press. Prior to joining Alteryx, Dan was a professor of marketing and marketing research at the University of British Columbia’s Sauder School of Business and Purdue University’s Krannert School of Management.
Qiaozhu Mei | Associate Professor, University of Michigan School of Information
Learning Representations for Large-Scale Networks
Abstract: Recent successes in big data analytics and representation learning have drawn lots of attention. While deep learning techniques have demonstrated their great power in image, text, and speech data, finding methods of learning useful representations for discrete networked data remains a major challenge. In this talk, Dr. Mei will introduce recent progress made by his research group in learning the representations for large-scale network data. He will introduce efficient algorithms that embed nodes into a continuous vector space so that the local and global structural information is preserved. By further projecting the representation to a 2D space, we are able to visualize millions of high-dimensional data points meaningfully on a single slide. Learning the representation of a network as a whole demonstrates that the topological structure of a network has a predictive power for its growth.
Bio: Qiaozhu Mei is an associate professor at the School of Information, University of Michigan. He is widely interested in data mining, machine learning, information retrieval and their applications to the Web, natural language, social networks, and health informatics. He is a recipient of the NSF CAREER Award and multiple best paper awards at ICML, KDD, WSDM, and other related venues. He has co-chaired multiple research tracks of the WWW conference, has served on the editorial boards of multiple top journals, and is the general co-chair of SIGIR 2018.
Scott Swinford | Chief Operating Officer, Perscio
Analytics in Business
Abstract: The approach to analytics has changed rapidly over the past 25 years, having evolved from providing historical analysis to directing prescriptive solutions.
This talk will review the evolution of analytic tools and environments as well as how industries ranging from agriculture to hospitality leverage analytics. The talk will also include how to show value and gain acceptance for analytic efforts in business environments.
Bio: Scott is a consultant with expertise in analytics, pricing, supply chain management, and contact center management. He spent more than 20 years inside Fortune 500, publicly-traded firms such as Starwood and Wyndham where he was responsible for optimizing revenue for global business units, new products, and start up divisions. He started his career as an analyst working to uncover revenue opportunities through the use of data and analytics. This work improved management decision-making for inventory optimization, new pricing strategies, and product offerings. Scott has held senior leadership roles in both global consumer and B2B environments where he helped the large organizations he has served grow, innovate, and cut costs. He earned a B.S. in Industrial Engineering from Purdue University and a B.S. in Economics & Pre-Engineering from St. Joseph’s College.
A copy of the slides used in this talk can be found here.
Rishik Dhar | Principal Data Engineer at Target
Personalization, Recommendations and Similarity - Towards Real-Time, Personalized Recommendations
Abstract: Retail industry generally operates under very stringent timelines and non-negotiable constraint of customer satisfaction. To run a retail business these days requires a combination of engineering practices and rigorous science. The next milestone for recommendations systems is to personalize recommendations in real-time driven by user interaction. To achieve this we need to rethink the conventional ideas around recommendation algorithms and the delivery mechanisms. This talk is a discussion on what we can do to make the most appropriate suggestions to consumers and make them fast enough to be relevant to their decision-making process.
Biography: Rishik Dhar is a Principal Data Engineer at Target’s Data Science and Engineering Center of Excellence in Sunnyvale, CA. He works on solving retail problems using machine learning on Target’s big data platform using distributed computing and parallel computing paradigms in his applied data science work. He received his MS in Electrical and Computer Engineering from Carnegie Mellon University, with a concentration on applications of Machine Learning in Automated Speech Recognition. Part of his job is to explore new technology areas relevant to Data Science and Engineering. He leads a team of Data Scientists and Engineers to deliver recommendations over Target’s Online Shopping experience. Rishik’s interests lie in Speech Technologies, Auditory Interfaces, Human Cognition, Assistive Technologies, Chat Bots and Social Good. In his free time, he likes to jam with his 6-year-old daughter over Bollywood songs. He is curious about Oculus VR on Samsung Gear, AI APIs, TensorFlow on GPUs and most recently WaveNet. He is also looking for engineers and scientists to join his team and help Target build state-of-the-art algorithms for solving retail problems as well as tackling general challenges in the area of applied machine learning.
A copy of the slides used in this talk can be found here.
Duru Ahanotu | Director of Corporate Measurement for Yahoo!'s Global Research and Insights Group
Organizing around Big Data
Abstract: The era of “Big Data” comes with big promises. The availability of massive amounts of data, along with the power to organize and process it, has strengthened our ability to bring evidence to bear on our decisions. I propose a model for organizing effectively around Big Data called an “Insights Supply Chain.” Embedded within a framework of experimentation and empowering data tools, an Insights Supply Chain transforms Big Data into actionable decisions. I provide some examples of how Yahoo uses Big Data to create engaging products and user experiences. I conclude with some reminders of the limits of our knowledge that have yet to change and challenge practitioners to respect the wide-reaching impacts they can have on individuals and society as a whole.
Bio: N. Duru Ahanotu, Ph.D. is the Director of Corporate Measurement for Yahoo’s Global Research and Insights group under the Marketing organization. Prior to this, Dr. Ahanotu led a data insights team for the Yahoo Advertising and Data organization. Before joining Yahoo, Dr. Ahanotu served as a Solutions Architect supporting and implmenting price optimization software used by web publishers for setting the prices for premium advertising inventory.
A copy of the slides used in this talk can be found here.
Tom Arkins | Section Chief of IT and Informatics for Indianapolis EMS
So You Have Data, Now What?
Abstract: This talk goes over how Indianapolis Emergency Medical Services use data in their daily operations. We will discuss our interactions with the police department and other public health and safety agencies, as well as the use of technologies in a disaster setting.
Bio: Tom Arkins has been in public safety since 1986, beginning his career with the White River Fire Department. He has been with Wishard/Indianapolis EMS since 1994 serving as an EMT, Paramedic, EMS Supervisor, and Tactical Paramedic. Currently he serves as the Section Chief of IT and Informatics for Indianapolis EMS.
Larry Smarr | Director of the California Institute for Telecommunications and Information Technology (Calit2)
Abstract: The human body is host to 100 trillion microorganisms, ten times the number of cells in the human body and these microbes contain 300 times the number of DNA genes that our human DNA does. The microbial component of our “superorganism” is comprised of hundreds of species with immense biodiversity. Thanks to the National Institutes of Health’s Human Microbiome Program researchers have been discovering the states of the human microbiome in health and disease. To put a more personal face on the “patient of the future,” I have been collecting massive amounts of data from my own body over the last ten years, which reveals detailed examples of the episodic evolution of this coupled immune-microbial system. An elaborate software pipeline, running on high performance computers, reveals the details of the microbial ecology and its genetic components. A variety of data science techniques are used to pull biomedical insights from this large data set. We can look forward to revolutionary changes in medical practice over the next decade.
Bio: Larry Smarr is the founding Director of the California Institute for Telecommunications and Information Technology (Calit2), a UC San Diego/UC Irvine partnership, and holds the Harry E. Gruber professorship in the Department of Computer Science and Engineering (CSE) of UCSD’s Jacobs School of Engineering. Before that he was the founding director of the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Champaign-Urbana. He is a member of the National Academy of Engineering, as well as a Fellow of the American Physical Society and the American Academy of Arts and Sciences. In 2006 he received the IEEE Computer Society Tsutomu Kanai Award for his lifetime achievements in distributed computing systems and in 2014 the Golden Goose Award. He served on the NASA Advisory Council to 4 NASA Administrators, was chair of the NASA Information Technology Infrastructure Committee and the NSF Advisory Committee on Cyberinfrastructure, a member of the DOE Advanced Scientific Computing Advisory Committee and ESnet Policy Board, and for 8 years he was a member of the NIH Advisory Committee to the NIH Director, serving 3 directors. His personal interests include growing orchids, snorkeling coral reefs, and quantifying the state of his body.
Pedro Alves | Director of Data Science at Sentient Technologies
Practical Data Science
Abstract: This talk will cover practical points in data science that come up in industry as well as some real-world examples of projects and problems. The problems discussed in this talk will focus on aspects of projects that are usually not taught in data science courses but are where a considerable portion of work is done. This talk will also briefly cover the history and theory behind ensembles and why they work so well.
Biography: Pedro has experience in predicting, analyzing and visualizing data in the fields of: genomics, gene networks, cancer metastasis, insurance fraud/costs, hospital readmissions, soccer strategies, joint injuries, social graphs, human attraction, spam detection and topic modeling, among others. Pedro is incredibly passionate about all aspects of data science and is constantly creating new techniques and algorithms to suit the problems at hand. At Banjo, his efforts were geared towards detecting and interpreting everything that is happening in the world in real-time, from major concerts and sporting events to major and minor news. Now he leads the data science efforts at Sentient.ai where they use evolutionary algorithms and massively scaled deep learning to solve problems such as trading and visual comprehension of consumer products.
Woodburn Hall 200
Mauro Martino | Cognitive Visualization Lab at IBM Watson Cambridge, MA
Point, Line & Data: New methods for understanding complex data, from storytelling to machine learning
Abstract: The aesthetics of science is changing, the diffusion of data visualization tools is enabling a revival of beauty in scientific research. More and more papers are presented with seductive images, convincing videos, and sharp interactive tools. Scientific storytelling will be discussed with 2 case studies: “Charting Culture, 2014”, and “Rise of partisanship, 2015”. In the second part of the talk we explore the connection between Machine Learning & Data Visualization. We will see together 3 projects: News Explorer – exploration of real-time news, Ted Watson – exploration of a large corpus of videos, and Watson 500 – the analysis of relationships between entities and topics in a specific corpus of date. We encourage the public to use these tools before the talk:
Biography: Mauro Martino is an Italian expert in data visualization based in Boston. He created and leads the Cognitive Visualization Lab at IBM Watson in Cambridge, Massachusetts, USA. Martino’s data visualizations have been published in the scientific journals Nature, Science, and the Proceedings of the National Academy of Sciences. His projects have been shown at international festivals including Ars Electronica, and art galleries including the Serpentine Gallery, UK, GAFTA, USA, and the Lincoln Center, USA.
Jointly organized by the Data Science program and the Cyberinfrastructure for Network Science Center, this talk is partially supported by Indiana University’s Consortium for the Study of Religion, Ethics and Society, a consortium sponsored by the Vice President for Research Office
Talk details can be found at http://cns.iu.edu/cnstalks. All talks will take place in the new Social Science Research Commons, Woodburn Hall 200 (unless otherwise noted).
Abe Weston | Associate Principal performing in Fraud Analytics
Predictive Modeling of Pay-Per-Click Keywords Bid Value
Abstract: Pay-Per-Click (PPC) keywords are purchased by advertisers, preferably at minimum cost, in order to maximize profit. Data sets with predictors are generated using data collected from the Google Search API for Shopping and the Microsoft Ad Intelligence Service. Machine learning is applied to successfully predict the future bid value.
Biography: Abe Weston works for Google as an Associate Principal performing in fraud analytics. He has an undergraduate degree in Industrial Technology, MS degrees in Telecommunications Systems and Data Mining, and a graduate certificate in Statistics. He enjoys traveling, camping, reading, hiking, biking, yoga, playing guitar, and spending time with his wife and three kids.
Woodburn Hall 200
Kalev Leetaru | Senior Fellow at the George Washington University Center for Cyber & Homeland Security
Quantifying, Visualizing, and Forecasting Global Human Society Through “Big Data”: What it Looks Like To Compute on the Entire Planet
Abstract: Put simply, the GDELT Project is a realtime index over global human society, inventorying the world’s events, emotions, and narratives as they happen. GDELT live machine translates the world’s information across 65 languages and identifies the planet’s events, counts, quotes, people, organizations, locations, millions of themes and thousands of emotions, imagery, video, and social posts, creating a massive realtime global graph. Here’s what it looks like to conduct data analytics at a truly planetary scale.
Biography: One of Foreign Policy Magazine’s Top 100 Global Thinkers of 2013, Kalev Leetaru is a Senior Fellow at the George Washington University Center for Cyber & Homeland Security and a member of its Counterterrorism and Intelligence Task Force, as well as being a 2015-2016 Google Developer Expert for Google Cloud Platform. From 2013-2014 he was the Yahoo! Fellow in Residence of International Values, Communications Technology & the Global Internet at Georgetown University’s Edmund A. Walsh School of Foreign Service, where he was also an Adjunct Assistant Professor, as well as a Council Member of the World Economic Forum’s Global Agenda Council on the Future of Government. Featured in the presses of more than 100 nations and from Nature to the New York Times, his work focuses on how innovative applications of the world’s largest datasets, computing platforms, algorithms and mindsets can reimagine the way we understand and interact with our global world. More on his latest projects can be found on his website athttp://www.kalevleetaru.com/ or http://blog.gdeltproject.org.
Jointly organized by the Data Science program and the Cyberinfrastructure for Network Science Center, this talk is partially supported by Indiana University’s Consortium for the Study of Religion, Ethics and Society, a consortium sponsored by the Vice President for Research Office.
Jure Lescovec | Assistant Professor of Computer Science at Stanford University
Machine Learning for Human Decision Making
Abstract: In many real-life settings human judges are making decisions and choosing among many alternatives: Medical doctor deciding a treatment for a patient, criminal court judge making a decision about a defendant, a crowd-worker labeling an image, and a student answering a multiple-choice question. Gaining insights into human decision making is important for determining the quality of individual decisions as well as identifying human mistakes and biases.
In this talk we discuss the question of developing machine learning methodology for estimating the quality of individual judges and obtaining diagnostic insights into how various judges decide on different kinds of items. We develop a series of increasingly powerful hierarchical Bayesian models, which infer latent groups of judges and items with the goal of obtaining insights into the underlying decision process. We apply our framework to a wide range of real-world domains, and demonstrate that our approach can accurately predict judge’s decisions, diagnose types of mistakes judges tend to make, and infer true labels of items.
Bio: Jure Leskovec is assistant professor of Computer Science at Stanford University and chief scientist at Pinterest. Computation over massive data is at the heart of his research and has applications in computer science, social sciences, economics, marketing, and healthcare. This research has won several awards including a Lagrange Prize, Microsoft Research Faculty Fellowship, the Alfred P. Sloan Fellowship, and numerous best paper awards. Leskovec received his bachelor’s degree in computer science from University of Ljubljana, Slovenia, and his PhD in in machine learning from the Carnegie Mellon University and postdoctoral training at Cornell University. You can follow him on Twitter @jure.
Powered by Events Manager