Sungwook Yoon

A.I./Data to Production

Interests

I know Artificial Intelligence. I know Machine Learning. I now know Big Data. My interests are in connecting these dots. If you ask me what particular domains? Among many businesses, I have experiences in InfoSec data and marketing data.

Experiences

AI Research

Published in top-tier conferences and journals, AAAI, ICAPS, NIPS, UAI, ICML, JAIR, and JMLR

Deliveries

100% successful deliveries on Big Data solutions

Multiple wins in international competitions

Vision

The gap between Data and Model is still huge. BigData era's next challenge is modeling

Education

Purdue University

Ph.D. in Computer Engineering

Seoul National University

M.S. in Electrical Engineering. Specialized in Video data compression and networking

Seoul National University

B.S. in Electrical Engineering

Tech Skills

  • Hadoop/MapReduce
  • Spark/H2O
  • Elasticsearch/Kibana
  • R/Python/Scala/SQL
  • Bash/C/C++/LinuxSysAdmin
  • Lisp/Scheme
  • AWS
  • GCE

Professions

MapR Technologies

Principal Data Scientist

2014-

Big Data Solution Deliveries

Tracevector

Sr. Data Scientist

2013-

Malware Expression Detection

Seven Networks

Architect

2013-

Mobile Signaling Optimization

Identified

Data Scientist

2012-

Social inference on Web users for Career Analysis

PARC

Researcher

2008-

Performed for Various DARPA projects and Xerox projects

Arizona State University

Research Assistant Professor

2007-2008

Involved with IL DARPA project. Taught some classes. Led reading group on AI related papers

Arizona State University

Postdoctoral Scholar

2006-2007

Successfully led ASU DARPA effort on Integrated Learning program.

Purdue University

Teaching Assistant

2002-2006

Digital Logic, Python, Scheme

Awards

Best Paper Award Runner Up

Journal of Artificial Intelligence, 2011

Best Paper Award Runner Up

International Conference on Automated Planning and Scheduling, 2011

Best Learner

First International Learning and Planning Competition

(Unofficial) Winner

Second International Probabilistic Planning Competition, 2006

Overall Winner

First International Probabilistic Planning Competition, 2004

Projects

Real-Time Network Anomaly Detection System for Multiple Customers

We worked with a few fortune 500 companies for their IT security data analysis projects for several months. We used Spark to enrich the streaming data and used ES to Spark to ingest into Elasticsearch Visualization. We developed machine learning system using Spark MLLib for the baseline analysis and traffic pattern. We also used Spark GraphX to develop consistent network topology of the customer network. PageRank algorithm and Connected Component analysis in GraphX help the customer easily find significant lateral data movement.We used Scala for Spark development. Upon customers request, we perform Pyspark demo.

Data Ingestion Into MapR

We performed several data ingestion services for multiple customers. Mostly from existing databases or streaming log text data,We ingested into either MapR FS, MapR DB or OpenTSDB. The tools used are, Sqoop, Spark Streaming, Logstash or Bash codes

Data Science Engagements

Use Case Discovery with several customers. Machine Learning code developed in Scala, Spark, H2O. Lead successful workshops with customer on Machine Learning and Hadoop. Delivered successful engagements in Use Case Discovery and Code development. Developed Machine Learning on Hadoop course and lab material

Malware Expression Detection

It is impossible to detect ZeroDay attack. As the network components are becoming extremely diversified, more vulnerable points are detected. However, once malware set in, the way they express on network is pretty limited, since the monetization behavior is not evolving as fast. I focused on detecting monetization behavior, particularly DDoS participating infected machines. I used Hadoop/Spark/Scala to generate/verify hypothesis/model. Then I implemented the algorithm in C++ in production

Mobile Signaling Optimization

Big challenge for mobile operators is the signaling congestion. Modern smartphones make periodic data connection to pull or push data to the cloud. The data amount here is little, but the burnden to setup the TCP on mobile network is huge for network operators. I worked to optimize the signaling traffic. I analyzed mobile traffic, proposed key performance metric and developed optimization algorithm. This involved quite a lot of hadoop data processing work and machine learning algorithm design.

Occupation Inference

People express their job titles as they want. What is their real occupation? I modeled this problem as Hidden Markov Model. I used NGramDistance to model emission probabilities and developed mechanism for learning transision probabilities. I used Java for the development. This project is basically a NLP project.

Social Inference

I worked on social inference framework from massive facebook data of tera-byte. I write SQL program to fast process data and I use R to visualize/analyze the data.

Workforce Optimization

This is a Xerox project. Xerox/ACS maintains call centers and we try to optimize number of call center agents working. We were given a database without much explanation. We went into the database and identified tables that we want. Then, we modeled the call arrivals as bucketed poisson arrival. Our call arrival model was highly accurate with more than .96 R square values. We then used AI planning technology to plan for agents hiring and firing strategies. We used Erlang to identify the number of agents needed to satisfy the service level. The final product was SaaS. We used JQuery/tomcat/Servlet/Hibernate framework.

Factored Particle Filtering for War Situation Tracking

In war, it is critically important to identify the current situation from live streaming data from the war field. We modeled the streaming data from the field as factored particles. We developed the current situation awareness framework based on the factored particle filtering paradigm. I developed the idea and delivered the java code.

Massive War Situation Exploration

The potential situations in war is extremely large. I used compact representation and BDD style compression technique to efficiently explore the infinitely possible situations. This was all coded in Lisp but I coded it into Java.

Learning from Military Experts

In military campaign through the air, the air-space is scare resource that every unit needs to share. The air-space manager maintains the feasible air-space schedule. How he/she accepts some mission and when she/he requests modification of the original air-space request is not clear. But we have the data. The data recorded from the managers operation. We learned from the data how he/she selected particular missions and when she/he asked from modification. I made the knowledge representation framework. I designed the machine learning algorithm for the highly skewed data distribution. I coded and delivered in Java.

Learning from Problem Solving

This was my whole thesis. I tried to learn the problem solving strategies from human demonstration. For example, I tried to learn the Hearts strategy from human playing. In doing so, I learned machine learning (classic and modern, inside and out), I learned knowledge representation techniques, I learned probablistic reasonining and statistics, and I learned optimization.

Journal Publication

Learning Probabilistic Hierarchical Task Networks as Probabilistic Context-Free Grammars to Capture User Preferences

Nan Li, Will Cushing, Subbarao Kambhampati, Sungwook Yoon. 2012

Transactions on Intelligent Systems and Technology (TIST)

An Ensemble Architecture for Learning Complex Problem-Solving Techniques From Demonstration

Xiaoqin (Shelley) Zhang , Sungwook Yoon , Phillip DiBona , Darren Scott Appling , Li Ding , Janardhan Rao Doppa , Derek Green , Jinhong K. Guo , Ugur Kuter , Geoff Levine , Reid L. MacTavish , Daniel McFarlane , James R Michaelis , Hala Mostafa11 , Santiago Ontanon , Charles Parker , Jainarayan Radhakrishnan , Antons Rebguns , Bhavesh Shrestha , Zhexuan Song , Ethan B. Trewhitt , Huzaifa Zafar , Chongjie Zhang , Dan Corkill , Gerald DeJong , Thomas G. Dietterich , Subbarao Kambhampati , Victor Lesser , Deborah L. McGuinness , Ashwin Ram , Diana Spears , Prasad Tadepalli , Elizabeth T. Whitaker , Weng-Keen Wong. 2011

Transactions on Intelligent Systems and Technology (TIST)

Learning Linear Ranking Functions for Beam Search with Application to Planning

Yuehua Xu, Alan Fern and Sungwook Yoon. 2009

Journal of Machine Learning Research (JMLR)

Learning Control Knowledge for Forward Search Planning

Sungwook Yoon, Alan Fern and Robert Givan. 2008.

Journal of Machine Learning Research (JMLR)

Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

Alan Fern, Sungwook Yoon, Robert Givan. 2006.

Journal of Artificial Intelligence Research

Conference Publication

Anticipatory Online Planning

Ethan Burns, Wheeler Ruml, J. Benton, Minh Binh Do, Sungwook Yoon.

International Conference on Automated Planning and Scheduling (ICAPS-2012)

Hybrid Qualitative Simulation of Military Operations

Thomas Hinrichs, Kenneth Forbus, Johan de Kleer, Sungwook Yoon, Eric Jones, Robert Hyland, Jason Wilson.

Innovative Application of Artificial Intelligence ( IAAI-2011)

Improving Determinization in Hindsight for On-line Probabilistic Planning

Sungwook Yoon, Wheeler Ruml, J. Benton, Minh Binh Do

International Conference on Automated Planning and Scheduling (ICAPS-2010)

Iterative Learning of Weighted Rule Sets for Greedy Search

Yuehua Xu, Alan Fern, Sungwook Yoon

International Conference on Automated Planning and Scheduling (ICAPS-2010)

Factored Envisioning

Johan De Kleer, Kenneth D. Forbus, Tom Hinrichs, Sungwook Yoon and Eric K. Jones

Qualitative Reasoning ( QR-2009)

Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

Nan Li, Subbarao Kambhampati and Sungwook Yoon

International Joint Conference on Artificial Intelligence ( IJCAI-2009)

An Ensemble Learning and Problem Solving Architecture for Airspace Management

Xiaoqin (Shelley) Zhang , Sungwook Yoon , Phillip DiBona , Darren Scott Appling , Li Ding , Janardhan Rao Doppa , Derek Green , Jinhong K. Guo , Ugur Kuter , Geoff Levine , Reid L. MacTavish , Daniel McFarlane , James R Michaelis , Hala Mostafa11 , Santiago Ontanon , Charles Parker , Jainarayan Radhakrishnan , Antons Rebguns , Bhavesh Shrestha , Zhexuan Song , Ethan B. Trewhitt , Huzaifa Zafar , Chongjie Zhang , Dan Corkill , Gerald DeJong , Thomas G. Dietterich , Subbarao Kambhampati , Victor Lesser , Deborah L. McGuinness , Ashwin Ram , Diana Spears , Prasad Tadepalli , Elizabeth T. Whitaker , Weng-Keen Wong.

Innovative Application of Artificial Intelligence ( IAAI-2009)

An Online Learning Method for Improving Over-subscription Planning

Sungwook Yoon, J. Benton, Subbarao Kambhampati

International Conference on Automated Planning and Scheduling (ICAPS-2008)

Probabilistic Planning via Determinization in Hindsight

Sungwook Yoon, Alan Fern, Subbarao Kambhampati and Robert Givan

National Conference of Artificial Intelligence (AAAI-2008)

FF-Replan: A Baseline Probabilistic Planner

Sungwook Yoon, Alan Fern and Robert Givan

International Conference on Automated Planning and Scheduling (ICAPS-2007)

Using Learned Policies in Heuristic-Search Planning

Sungwook Yoon, Alan Fern, and Robert Givan

International Joint Conference on Artificial Intelligence ( IJCAI-2007)

Discriminative Learning of Beam-Search Heuristics for Planning

Yuehua Xu, Alan Fern, and Sungwook Yoon

International Joint Conference on Artificial Intelligence ( IJCAI-2007)

Learning Heuristic Functions from Relaxed Plans

Sungwook Yoon, Alan Fern, and Robert Givan

International Conference on Automated Planning and Scheduling (ICAPS-2006)

Learning Measures of Progress for Planning Domains.

Sungwook Yoon, Alan Fern, and Robert Givan

National Conference on Artificial Intelligence (AAAI-2005)

Learning Domain-Specific Control Knowledge from Random Walks

Alan Fern, Sungwook Yoon, and Robert Givan

International Conference on Automated Planning and Scheduling (ICAPS-2004)

Approximate Policy Iteration with a Policy Language Bias

Alan Fern, Sungwook Yoon, and Robert Givan

Advances in Neural Information Processing Systems 16 (NIPS-2003)

Inductive Policy Selection for First-Order Markov Decision Processes

Sungwook Yoon, Alan Fern, and Robert Givan

Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI-2002)

Book Chapters

Explanation-Based Learning for Planning

Subbarao Kambhampati, Sungwook Yoon

Encyclopedia of Machine Learning 2010

Reinforcement Learning in Relational Domains: A Policy-Language Approach

Alan Fern, Sungwook Yoon and Robert Givan. 2005

Statistical Relational Learning, Editor: Lise Getoor, Ben Taskar

Workshop papers, Competition Proceedings

Towards Model-lite Planning: A Proposal for Learning and Planning with Incomplete Domain Models

Sungwook Yoon and Subbarao Kambhampati

ICAPS 2007 Workshop, AI Planning and Learning

Hierarchical Strategy Learning with Hybrid Representations

Sungwook Yoon and Subbarao Kambhampati

AAAI 2007 Workshop, Acquiring Planning Knowledge via Demonstration

Discrepancy Search with Reactive Policies for Planning

Sungwook Yoon

AAAI 2006 Workshop, Learning for Search

Learning Reactive Policies for Probabilistic Planning Domains

Sungwook Yoon, Alan Fern and Robert Givan

Proceeding on the First International Probabilistic Planning Competition

Relational Reinforcement Learning for Classical Planning

Alan Fern, Sungwook Yoon and Robert Givan

Workshop on Relational Reinforcement Learning

Professional Activities

Co-Organizer

The Third International Probabilistic Planning Competition, 2011

Volunteer Organizer

The Second International Probabilistic Planning Competition, 2006

Tutorial Instructor

Learning Techniques in Planning at ICAPS07

Reviewer, PC Member

Computational Intelligence, Machine Learning, IEEE Transactions On Automatic Control, Journal of Artificial Intelligence, Artificial Intelligence, AAAI, IJCAI, ICML, ICAPS

Sungwook Yoon — sungwook.yoon@gmail.com — (480) - 277-3458