Tyler Foxworthy | Chief Scientist - DemandJump Inc.
Analyzing Dynamic Networks with Topological Data Analysis
Abstract: The internet, social networks, the global transportation system… Complex networks are the substrate and substance of modern life. Unlike static networks, the structure and function of these systems evolves dynamically with time, which has traditionally posed significant challenges to analysts seeking to predict their behavior or optimize their performance. In recent years, Topological Data Analysis (TDA) has been demonstrated to be a meaningful platform for computing topological features of a wide variety of complex networks at multiple scales. Although relatively unknown by data scientists working in industry, TDA has over the last decade been gaining strong acceptance in the research community and seen a surge of growth in a wide variety of application areas, particularly when coupled with kernel-based machine learning techniques. The purpose of this talk is to discuss both the theoretical and practical aspects of applying TDA to the study of dynamic networks by demonstrating its application with Python and a years’ worth of real-time airline scheduling data.
Biography: Tyler Foxworthy is a mathematician and computational scientist working on a wide range of problems in machine learning and marketing optimization. An alumnus of Purdue University, his research is largely focused on the development of high performance algorithms for understanding complex networks, machine learning, and natural language processing. Tyler currently serves as the Chief Scientist of DemandJump Inc, and is a scientific advisor to many technology companies and investors in the Midwest. Tyler’s previous experience includes leadership and research roles in academia, biotech, and management consulting. Additionally, Tyler maintains an active research program and is a regular speaker at international scientific and industry conferences.
James "Jimi" Shanahan |
How Gradient and Autodiff are Transforming Deep Learning
Abstract: Just like electricity, the automobile, the Internet, and mobile phones transformed the 20th century, deep learning is transforming the 21st century, changing how people perceive and interact with technology, enabling machines perform a wider range of tasks, in many cases doing a better job than humans. These applications include: voice assistants on our smartphones, product recommendation engines, self-driving cars, deep fakes, high frequency stock market trading, applications for social good (combating crime), playing games (from Go to Atari), preventing credit card fraud, filtering out spam from our email inboxes, detecting and diagnosing medical diseases, the list goes on and on. Large companies, such as Amazon, Apple, Facebook, Google, Microsoft, and venture capitalists alike are investing heavily in deep learning research and applications.
This talk focuses primarily on one of the key enablers of deep learning, that of optimization theory’s gradient descent and its sidekick, autodiff. Shakespeare might have structured such a talk as follows and used the lens of reverse mode autodiff to aid with understanding:
Act 1: Hack it up
Act 2: BackProp: theory to the rescue
Act 3: Layer by layer learning, a medieval pastime
Act 4: Introspection: better init. and activation functions
Act 5: Express-laning the gradient: Skip Connections, the SoTA frontier (LSTMs, ResNet, Highway Nets, DenseNets)
These five acts will be supported by examples and Jupyter notebooks in Python and TensorFlow. In addition, this talk will show how reverse mode autodiff provides an efficient and effective calculus framework that is transforming how we do machine learning and how we should teach it.
Bio: Jimi has spent the past 25 years developing and researching cutting-edge artificial intelligent systems splitting his time between industry and academia. He has (co) founded several companies including: Church and Duncan Group Inc. (2007), a boutique consultancy in large scale AI which he runs in San Francisco; RTBFast (2012), a real-time bidding engine infrastructure play for digital advertising systems; and Document Souls (1999), a document-centric anticipatory information system. In 2012 he went in-house as the SVP of Data Science and Chief Scientist at NativeX, a mobile ad network that got acquired by MobVista in early 2016. In addition, he has held appointments at AT&T (Executive Director of Research), Turn Inc. (founding chief scientist), Xerox Research, Mitsubishi Research, and at Clairvoyance Corp (a spinoff research lab from CMU). He also advises several high-tech startups (including Quixey, Aylien, ChartBoost, DigitalBank you.co, VoxEdu, and others).
Jimi has been affiliated with the University of California at Berkeley and at Santa Cruz since 2008 where he teaches graduate courses on big data analytics, machine learning, deep learning, and stochastic optimization. In addition, he is currently visiting professor of data science at the University of Ghent, Belgium. He has published six books, more than 50 research publications, and over 20 patents in the areas of machine learning and information processing. Jimi received his PhD in engineering mathematics from the University of Bristol, U. K., and holds a Bachelor of Science degree from the University of Limerick, Ireland. He is a EU Marie Curie fellow. In 2011 he was selected as a member of the Silicon Valley 50 (Top 50 Irish Americans in Technology).
Peng Wong | Chief Data Science Officer, Omeda
Abstract: Are companies thinking about AI? How does a company innovate with data science, and why should they? The purpose of this talk is to showcase how businesses leverage data science for growth, and in the process answer these questions. We will look at how organizations effectively compete using data science in their journey towards innovation with AI.
Bio: Peng’s role as the Chief Data Science Officer at Omeda is to provide leadership in launching new products and accelerate the pace of innovation. He possesses over 29 years of experience, having led analytics teams and strategic initiatives focused on business transformation and growth. Omeda is a leading audience relationship management platform, providing a real-time, single view of a company’s audience through 24/7 data storage, management, matching and activation. He helped Omeda build their first data science team and advanced analytic capabilities. Peng previously worked at Angie’s List where he was Senior Director of Advanced Analytics. Prior to Angie’s List, he co-founded Beyond Predictive, an advanced analytics consulting company, and has worked in many industries helping companies of various sizes transform their data and analytic strategies. Peng received his B.A. and M.S in Computer Science from Southern Illinois University.
This event will be held in Rm. 1106 of Luddy Hall.
Bruno Miguel Tavares Goncalves | Moore-Sloane Data Science Fellow, NYU
Spatio-temporal Analysis of Language Use
Abstract: The advent of large-scale online social services coupled with the dissemination of affordable GPS-enabled smartphones resulted in the accumulation of massive amounts of data documenting our individual and social behavior. Using large datasets from sources such as Twitter, Wikipedia, Google Books and others, this talk will present several recent results on how languages are used across both time and space. In particular, we will analyze the role of multilinguals in social networks and how language dialects can be defined empirically based on how they are used in the real world. Finally, we will also analyze how English usage changes from place to place and over time and how languages can be used to identify communities within the urban environment.
Bio: Bruno Gonçalves is a Data Science fellow at NYU’s Center for Data Science while on leave from a tenured faculty position at Aix-Marseille Université. His expertise is in using large-scale datasets for the analysis of human behavior. After completing his joint Ph.D. in Physics and MSc in C.S. at Emory University of Atlanta, GA, in 2008 he joined the Center for Complex Networks and Systems Research at Indiana University as a Research Associate. From September 2011 until August 2012 he was an Associate Research Scientist at the Laboratory for the Modeling of Biological and Technical Systems at Northeastern University. Since 2008 he has been pursuing the use of Data Science and Machine Learning to study human behavior. By processing and analyzing large datasets from Twitter, Wikipedia, web access logs, and Yahoo! Meme, he studied how we can observe both overall and individual human behavior in an unobtrusive and widespread manner. The main applications of this research have been towards the study of Computational Linguistics, Information Diffusion, Behavioral Change and Epidemic Spread. Bruno is the author of 60+ publications with over 5200+ Google Scholar citations and an h-index of 30. In 2015 he was awarded the Complex Systems Society’s 2015 Junior Scientific Award for “outstanding contributions in Complex Systems Science” and he is the editor of the book Social Phenomena: From Data Analysis to Models (Springer, 2015).
This talk will be held in Rm. 1106 of Luddy Hall.
Virginia Eubanks | Associate Professor of Political Science – University of Albany, SUNY
Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor
Abstract: Today, automated systems control which neighborhoods get policed, which families attain needed resources, and who is investigated for fraud. While we all live under this new regime of data analytics, the most invasive and punitive systems are aimed at the poor. In her new book ‘Automating Inequality’, Virginia Eubanks systematically investigates the impacts of data mining, policy algorithms, and predictive risk models on poor and working-class people in America. The book is full of gut-wrenching and eye-opening stories, from a woman in Indiana whose benefits were literally cut off as she lay dying to a family in Pennsylvania in daily fear of losing their daughter because they fit a certain statistical profile. Join us to discuss this deeply researched, passionately written, incredibly timely book.
Bio: Virginia Eubanks is an Associate Professor of Political Science at the University at Albany, SUNY. She is the author of ‘Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor’; ‘Digital Dead End: Fighting for Social Justice in the Information Age’; and co-editor, with Alethia Jones, of ‘Ain’t Gonna Let Nobody Turn Me Around: Forty Years of Movement Building with Barbara Smith’. Her writing about technology and social justice has appeared in The American Prospect, The Nation, Harper’s and Wired. For two decades, Eubanks has worked in community technology and economic justice movements. Today she is a founding member of the Our Data Bodies Project and a Fellow at New America.
This event will be held in Rm. 1106 of Luddy Hall.
James Hendler | Tetherless World Chair of Computer, Web and Cognitive Sciences – RPI
Knowledge Representation in the Era of Deep Learning, Watson and the Semantic Web
Abstract: A burst in optimism (and unwarranted fear) has grown around a number of technologies that are high impact and able to solve problems that have challenged AI researchers for years. The over-enthusiasm that often follows such breakthroughs has caused some to declare (yet again) that it is the end of “knowledge representation” as AI moves into a world dominated by neural networks, data mining and the knowledge graph. In this talk, I argue that these technologies, while extremely powerful separately, are not only still a long way from human intelligence, but cannot get there without a level of knowledge and reasoning beyond what is currently available to these techniques. On the other hand, I also argue that taking these technologies into new and harder realms will require rethinking what traditional knowledge representation is and how it is used. Some early examples of work aimed at joining the approaches will be presented.
Bio: James Hendler is the Director of the Institute for Data Exploration and Applications and the Tetherless World Professor of Computer, Web and Cognitive Sciences at RPI. He also heads the RPI-IBM Center for Health Empowerment by Analytics, Learning and Semantics (HEALS) and serves as a Chair of the Board of the UK’s charitable Web Science Trust. Hendler has authored over 400 books, technical papers and articles in the areas of Semantic Web, artificial intelligence, agent-based computing and high performance processing. One of the originators of the “Semantic Web,” Hendler was the recipient of a 1995 Fulbright Foundation Fellowship, is a former member of the US Air Force Science Advisory Board, and is a Fellow of the AAAI, BCS, the IEEE, the AAAS and the ACM. He is also the former Chief Scientist of the Information Systems Office at the US Defense Advanced Research Projects Agency (DARPA) and was awarded a US Air Force Exceptional Civilian Service Medal in 2002. In addition he is the first computer scientist to serve on the Board of Reviewing editors for ‘Science’, co-editor-in-chief of the journal ‘Data Intelligence’, and an associate editor of ‘Big Data’. In 2010, Hendler was named one of the 20 most innovative professors in America by ‘Playboy’ magazine and was selected as an “Internet Web Expert” by the US government. In 2012, he was an inaugural recipient of the Strata Conference “Big Data” awards for his work on large-scale open government data. In 2013, he was appointed as the Open Data Advisor to New York State and in 2015 appointed a member of the US Homeland Security Science and Technology Advisory Committee. In 2016, Jim became a member of the National Academies Board on Research Data and Information. In 2017, Hendler joined the Director’s Advisory Committee for the National Security Directorate of the Pacific Northwest National Laboratory.
This event will be held in the Grand Hall of Luddy Hall.
Rich Carlton | President and COO, Data Realty
Infrastructure to insights - cutting through the hype to true business value from data analytics
Abstract: There is tremendous hype surrounding Big Data and analytics, but to truly get business value from data, there are numerous pitfalls and considerations that must be taken into account. Learn the lessons that have enabled Aunalytics to serve global organizations from here in Indiana and helped them to get indisputable value from their data.
Bio: Rich Carlton leads Data Realty as President and COO after twenty plus years of leadership experience in data and technology-based businesses. There are three corporate brands under his purview: the data center organization Data Realty, the Big Data and Analytics organization Aunalytics, and the Cloud Hosting and Managed Services company MicroIntegration. Carlton works towards driving the overall organization to meet the mission of harnessing the power of data to fuel the economic engine of growing companies, communities, and people. The organization hopes to lead clients toward a culture of data-driven decision-making, differentiating them within their industry and providing true competitive advantage.
Carlton earned a Bachelor of Science degree from Indiana University Kelly School of Business in 1992 with a focus in Computer and Information Systems. He is past Chairman of the Board for the St. Joseph County Chamber of Commerce. In addition, he serves on the Indiana state Chamber of Commerce Technology board, the Quality Committee of St. Joseph Regional Medical Center, the Entrepreneurship board of the South Bend-Elkhart Regional Development Authority, and the Indiana University South Bend School of Medicine Foundation board.
This event will be held in Wells Library Rm. 001.
Prabhat | Head of Data and Analytics Services team at NERSC, U.C. Berkeley
Top 10 Data Analytics Problems in Science
Abstract: Lawrence Berkeley National Lab and NERSC are at the frontier of scientific research. Historically, NERSC has provided leadership computing for the computational science community, but we now find ourselves tackling Big Data problems from an array of observational and experimental sources. In this talk, I will review the landscape of Scientific Big Data problems at all scales, spanning astronomy, cosmology, climate, neuroscience, bioimaging, genomics, material science and subatomic physics. I will present a list of Top 10 Data Analytics problems from these domains, and highlight NERSC’s current Data Analytics strategy and hardware/software resources. I will highlight opportunities for engaging with NERSC, Berkeley Lab and the scientific enterprise.
Bio: Prabhat leads the Data and Analytics Services team at NERSC. His current research interests include scientific data management, parallel I/O, high performance computing and scientific visualization. He is also interested in applied statistics, machine learning, computer graphics and computer vision. Prabhat received an ScM in Computer Science from Brown University (2001) and a B.Tech in Computer Science and Engineering from IIT-Delhi (1999). He is currently pursuing a PhD in the Earth and Planetary Sciences Department at U.C. Berkeley.
Liangjie Hong | Head of Data Science at Etsy
A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation
Abstract: Recommending personalized content to users is a long-standing challenge to many online services, including Facebook, Yahoo!, LinkedIn and Twitter. Traditional recommendation models such as latent factor models and feature-based models are usually trained for all users and optimize an “average” experience for them, yielding sub-optimal solutions. Although multi-task learning provides an opportunity to learn personalized models per user, learning algorithms are often tailored to specific models (e.g., generalized linear model, matrix factorization), creating obstacles for a unified engineering interface, which is important for large Internet companies. This talk will present an empirical framework to learn user-specific personal models for content recommendation by utilizing gradient information from a global model, which potentially benefits any model that can be optimized through gradients, offering a lightweight yet generic alternative to conventional multi-task learning algorithms for user personalization. The effectiveness of the proposed framework is demonstrated by incorporating it in three popular machine learning algorithms including logistic regression, gradient boosting decision tree, and matrix factorization. An extensive empirical evaluation shows a significant improvement in the efficiency of personalized recommendations in real-world datasets.
Bio: Liangjie Hong is Head of Data Science at Etsy Inc., managing a group of data scientists to deliver cutting-edge scientific solutions for: Search and Discovery, Personalization and Recommendation, and Computational Advertising. Previously, he was Senior Manager of Research at Yahoo Research from 2013 to 2016, leading science efforts for Personalization and Search Sciences. Liangjie has published papers in all major international conferences in data mining, machine learning and information retrieval, such as SIGIR, WWW, KDD, CIKM, AAAI, WSDM, RecSys and ICML, winning WWW 2011 Best Poster Paper Award, WSDM 2013 Best Paper Nominated and RecSys 2014 Best Paper Award, as well as serving as a program committee member in KDD, WWW, SIGIR, WSDM, AAAI, EMNLP, ICWSM, ACL, CIKM, IJCAI and several workshops. In addition, he constantly reviews articles in prestigious journals such as DMKD, TKDD, TIST, TIS, and TKDE. Liangjie co-founded the User Engagement Optimization Workshop, which has been held in conjunction with CIKM 2013 and KDD 2014. Prior to Yahoo Research, he obtained his Ph.D. (2013) and M.S. (2010) from Lehigh University and B.S. (2007) from Beijing University of Chemical Technology, all in Computer Science.
This talk will be held in Lindley Hall, Rm. 102.
Kimberly Van Auken | Database Curator, WormBase and Gene Ontology Consortium
Data Curation in the Biomedical Sciences: from Text to Databases to Knowledge Discovery
Abstract: Knowledge discovery in the biomedical sciences depends on accurate, consistent representation of data in knowledgebases. Over the past two decades the biocuration community, including the Model Organism Databases (MODs) and the Gene Ontology Consortium (GOC) (http://geneontology.org/), have been at the forefront of biological knowledge management. The vast and rapidly increasing amount of biomedical data, however, makes biocuration a very challenging task. In my talk, I’ll discuss methods we’ve employed at WormBase (https://wormbase.org) and the GOC to help meet these challenges, with specific emphasis on the use of: 1) text mining tools, such as Textpresso (http://textpresso.org/), to identify suitable papers and evidence sentences for curation, 2) controlled vocabularies and ontologies to model biological data, and 3) data capture, visualization, and analysis tools to engage users and foster knowledge discovery.
Bio: Kimberly Van Auken, Ph.D. is a Database Curator for WormBase, the online database housing the genetics, genomics and biology of Caenorhabditis elegans and other nematodes. She serves as an ontology editor and co-manager of the Annotation Working Group for the Gene Ontology Consortium and is a member of the editorial board for ‘Database: The Journal of Biological Databases and Curation’. She holds a B.S. in Biochemistry from the University of Rochester, Rochester, NY and a Ph.D. in Molecular, Cellular, and Developmental Biology from the University of Colorado, Boulder.
This event will be held in Rm. 102 of Lindley Hall until further notice.
that supports HTML5 video
Powered by Events Manager