Interests
I know Artificial Intelligence. I know Machine Learning. I now know Big Data. My interests are in connecting these dots. If you ask me what particular domains? Among many businesses, I have experiences in InfoSec data and marketing data.
Experiences
AI Research
Published in top-tier conferences and journals, AAAI, ICAPS, NIPS, UAI, ICML, JAIR, and JMLR
Deliveries
100% successful deliveries on Big Data solutions
Multiple wins in international competitions
Vision
The gap between Data and Model is still huge. BigData era's next challenge is modeling
Education
Purdue University
Ph.D. in Computer Engineering
Seoul National University
M.S. in Electrical Engineering. Specialized in Video data compression and networking
Seoul National University
B.S. in Electrical Engineering
Tech Skills
- Hadoop/MapReduce
- Spark/H2O
- Elasticsearch/Kibana
- R/Python/Scala/SQL
- Bash/C/C++/LinuxSysAdmin
- Lisp/Scheme
Professions
MapR Technologies
Principal Data Scientist
2014-
Big Data Solution Deliveries
Tracevector
Sr. Data Scientist
2013-
Malware Expression Detection
Seven Networks
Architect
2013-
Mobile Signaling Optimization
Identified
Data Scientist
2012-
Social inference on Web users for Career Analysis
PARC
Researcher
2008-
Performed for Various DARPA projects and Xerox projects
Arizona State University
Research Assistant Professor
2007-2008
Involved with IL DARPA project. Taught some classes. Led reading group on AI related papers
Arizona State University
Postdoctoral Scholar
2006-2007
Successfully led ASU DARPA effort on Integrated Learning program.
Purdue University
Teaching Assistant
2002-2006
Digital Logic, Python, Scheme
Awards
Best Paper Award Runner Up
Journal of Artificial Intelligence, 2011
Best Paper Award Runner Up
International Conference on Automated Planning and Scheduling, 2011
Best Learner
First International Learning and Planning Competition
(Unofficial) Winner
Second International Probabilistic Planning Competition, 2006
Overall Winner
First International Probabilistic Planning Competition, 2004
Projects
Real-Time Network Anomaly Detection System for Multiple Customers
We worked with a few fortune 500 companies for their IT security data analysis projects for several months. We used Spark to enrich the streaming data and used ES to Spark to ingest into Elasticsearch Visualization. We developed machine learning system using Spark MLLib for the baseline analysis and traffic pattern. We also used Spark GraphX to develop consistent network topology of the customer network. PageRank algorithm and Connected Component analysis in GraphX help the customer easily find significant lateral data movement.We used Scala for Spark development. Upon customers request, we perform Pyspark demo.
Data Ingestion Into MapR
We performed several data ingestion services for multiple customers. Mostly from existing databases or streaming log text data,We ingested into either MapR FS, MapR DB or OpenTSDB. The tools used are, Sqoop, Spark Streaming, Logstash or Bash codes
Data Science Engagements
Use Case Discovery with several customers. Machine Learning code developed in Scala, Spark, H2O. Lead successful workshops with customer on Machine Learning and Hadoop. Delivered successful engagements in Use Case Discovery and Code development. Developed Machine Learning on Hadoop course and lab material
Malware Expression Detection
It is impossible to detect ZeroDay attack. As the network components are becoming extremely diversified, more vulnerable points are detected. However, once malware set in, the way they express on network is pretty limited, since the monetization behavior is not evolving as fast. I focused on detecting monetization behavior, particularly DDoS participating infected machines. I used Hadoop/Spark/Scala to generate/verify hypothesis/model. Then I implemented the algorithm in C++ in production
Mobile Signaling Optimization
Big challenge for mobile operators is the signaling congestion. Modern smartphones make periodic data connection to pull or push data to the cloud. The data amount here is little, but the burnden to setup the TCP on mobile network is huge for network operators. I worked to optimize the signaling traffic. I analyzed mobile traffic, proposed key performance metric and developed optimization algorithm. This involved quite a lot of hadoop data processing work and machine learning algorithm design.
Occupation Inference
People express their job titles as they want. What is their real occupation? I modeled this problem as Hidden Markov Model. I used NGramDistance to model emission probabilities and developed mechanism for learning transision probabilities. I used Java for the development. This project is basically a NLP project.
Social Inference
I worked on social inference framework from massive facebook data of tera-byte. I write SQL program to fast process data and I use R to visualize/analyze the data.
Workforce Optimization
This is a Xerox project. Xerox/ACS maintains call centers and we try to optimize number of call center agents working. We were given a database without much explanation. We went into the database and identified tables that we want. Then, we modeled the call arrivals as bucketed poisson arrival. Our call arrival model was highly accurate with more than .96 R square values. We then used AI planning technology to plan for agents hiring and firing strategies. We used Erlang to identify the number of agents needed to satisfy the service level. The final product was SaaS. We used JQuery/tomcat/Servlet/Hibernate framework.
Factored Particle Filtering for War Situation Tracking
In war, it is critically important to identify the current situation from live streaming data from the war field. We modeled the streaming data from the field as factored particles. We developed the current situation awareness framework based on the factored particle filtering paradigm. I developed the idea and delivered the java code.
Massive War Situation Exploration
The potential situations in war is extremely large. I used compact representation and BDD style compression technique to efficiently explore the infinitely possible situations. This was all coded in Lisp but I coded it into Java.
Learning from Military Experts
In military campaign through the air, the air-space is scare resource that every unit needs to share. The air-space manager maintains the feasible air-space schedule. How he/she accepts some mission and when she/he requests modification of the original air-space request is not clear. But we have the data. The data recorded from the managers operation. We learned from the data how he/she selected particular missions and when she/he asked from modification. I made the knowledge representation framework. I designed the machine learning algorithm for the highly skewed data distribution. I coded and delivered in Java.
Learning from Problem Solving
This was my whole thesis. I tried to learn the problem solving strategies from human demonstration. For example, I tried to learn the Hearts strategy from human playing. In doing so, I learned machine learning (classic and modern, inside and out), I learned knowledge representation techniques, I learned probablistic reasonining and statistics, and I learned optimization.
Journal Publication
Learning Probabilistic Hierarchical Task Networks as Probabilistic Context-Free Grammars to Capture User Preferences
Nan Li, Will Cushing, Subbarao Kambhampati, Sungwook Yoon. 2012
Transactions on Intelligent Systems and Technology (TIST)
An Ensemble Architecture for Learning Complex Problem-Solving Techniques From Demonstration
Xiaoqin (Shelley) Zhang , Sungwook Yoon , Phillip DiBona , Darren Scott Appling , Li Ding , Janardhan Rao Doppa , Derek Green , Jinhong K. Guo , Ugur Kuter , Geoff Levine , Reid L. MacTavish , Daniel McFarlane , James R Michaelis , Hala Mostafa11 , Santiago Ontanon , Charles Parker , Jainarayan Radhakrishnan , Antons Rebguns , Bhavesh Shrestha , Zhexuan Song , Ethan B. Trewhitt , Huzaifa Zafar , Chongjie Zhang , Dan Corkill , Gerald DeJong , Thomas G. Dietterich , Subbarao Kambhampati , Victor Lesser , Deborah L. McGuinness , Ashwin Ram , Diana Spears , Prasad Tadepalli , Elizabeth T. Whitaker , Weng-Keen Wong. 2011
Transactions on Intelligent Systems and Technology (TIST)
Learning Linear Ranking Functions for Beam Search with Application to Planning
Yuehua Xu, Alan Fern and Sungwook Yoon. 2009
Journal of Machine Learning Research (JMLR)
Learning Control Knowledge for Forward Search Planning
Sungwook Yoon, Alan Fern and Robert Givan. 2008.
Journal of Machine Learning Research (JMLR)
Approximate Policy Iteration with a Policy Language Bias:
Solving Relational Markov Decision Processes
Alan Fern, Sungwook Yoon, Robert Givan. 2006.
Journal of Artificial Intelligence Research
Conference Publication
Anticipatory Online Planning
Ethan Burns, Wheeler Ruml, J. Benton, Minh Binh Do, Sungwook Yoon.
International Conference on Automated Planning and Scheduling (ICAPS-2012)
Hybrid Qualitative Simulation of Military Operations
Thomas Hinrichs, Kenneth Forbus, Johan de Kleer, Sungwook Yoon, Eric Jones, Robert Hyland, Jason Wilson.
Innovative Application of Artificial Intelligence ( IAAI-2011)
Improving Determinization in Hindsight for On-line Probabilistic Planning
Sungwook Yoon, Wheeler Ruml, J. Benton, Minh Binh Do
International Conference on Automated Planning and Scheduling (ICAPS-2010)
Iterative Learning of Weighted Rule Sets for Greedy Search
Yuehua Xu, Alan Fern, Sungwook Yoon
International Conference on Automated Planning and Scheduling (ICAPS-2010)
Factored Envisioning
Johan De Kleer, Kenneth D. Forbus, Tom Hinrichs, Sungwook Yoon and Eric K. Jones
Qualitative Reasoning ( QR-2009)
Learning Probabilistic Hierarchical Task Networks to Capture User Preferences
Nan Li, Subbarao Kambhampati and Sungwook Yoon
International Joint Conference on Artificial Intelligence ( IJCAI-2009)
An Ensemble Learning and Problem Solving Architecture for Airspace Management
Xiaoqin (Shelley) Zhang , Sungwook Yoon , Phillip DiBona , Darren Scott Appling , Li Ding , Janardhan Rao Doppa , Derek Green , Jinhong K. Guo , Ugur Kuter , Geoff Levine , Reid L. MacTavish , Daniel McFarlane , James R Michaelis , Hala Mostafa11 , Santiago Ontanon , Charles Parker , Jainarayan Radhakrishnan , Antons Rebguns , Bhavesh Shrestha , Zhexuan Song , Ethan B. Trewhitt , Huzaifa Zafar , Chongjie Zhang , Dan Corkill , Gerald DeJong , Thomas G. Dietterich , Subbarao Kambhampati , Victor Lesser , Deborah L. McGuinness , Ashwin Ram , Diana Spears , Prasad Tadepalli , Elizabeth T. Whitaker , Weng-Keen Wong.
Innovative Application of Artificial Intelligence ( IAAI-2009)
An Online Learning Method for Improving Over-subscription Planning
Sungwook Yoon, J. Benton, Subbarao Kambhampati
International Conference on Automated Planning and Scheduling (ICAPS-2008)
Probabilistic Planning via Determinization in Hindsight
Sungwook Yoon, Alan Fern, Subbarao Kambhampati and Robert Givan
National Conference of Artificial Intelligence (AAAI-2008)
FF-Replan: A Baseline Probabilistic Planner
Sungwook Yoon, Alan Fern and Robert Givan
International Conference on Automated Planning and Scheduling (ICAPS-2007)
Using Learned Policies in Heuristic-Search Planning
Sungwook Yoon, Alan Fern, and Robert Givan
International Joint Conference on Artificial Intelligence ( IJCAI-2007)
Discriminative Learning of Beam-Search Heuristics for Planning
Yuehua Xu, Alan Fern, and Sungwook Yoon
International Joint Conference on Artificial Intelligence ( IJCAI-2007)
Learning Heuristic Functions from Relaxed Plans
Sungwook Yoon, Alan Fern, and Robert Givan
International Conference on Automated Planning and Scheduling (ICAPS-2006)
Learning Measures of Progress for Planning Domains.
Sungwook Yoon, Alan Fern, and Robert Givan
National Conference on Artificial Intelligence (AAAI-2005)
Learning Domain-Specific Control Knowledge from Random Walks
Alan Fern, Sungwook Yoon, and Robert Givan
International Conference on Automated Planning and Scheduling (ICAPS-2004)
Approximate Policy Iteration with a Policy Language Bias
Alan Fern, Sungwook Yoon, and Robert Givan
Advances in Neural Information Processing Systems 16 (NIPS-2003)
Inductive Policy Selection for First-Order Markov Decision Processes
Sungwook Yoon, Alan Fern, and Robert Givan
Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI-2002)
Book Chapters
Explanation-Based Learning for Planning
Subbarao Kambhampati, Sungwook Yoon
Encyclopedia of Machine Learning 2010
Reinforcement Learning in Relational Domains: A Policy-Language Approach
Alan Fern, Sungwook Yoon and Robert Givan. 2005
Statistical Relational Learning, Editor: Lise Getoor, Ben Taskar
Workshop papers, Competition Proceedings
Towards Model-lite Planning: A Proposal for Learning and Planning with Incomplete Domain Models
Sungwook Yoon and Subbarao Kambhampati
ICAPS 2007 Workshop, AI Planning and Learning
Hierarchical Strategy Learning with Hybrid Representations
Sungwook Yoon and Subbarao Kambhampati
AAAI 2007 Workshop, Acquiring Planning Knowledge via Demonstration
Discrepancy Search with Reactive Policies for Planning
Sungwook Yoon
AAAI 2006 Workshop, Learning for Search
Learning Reactive Policies for Probabilistic Planning Domains
Sungwook Yoon, Alan Fern and Robert Givan
Proceeding on the First International Probabilistic Planning Competition
Relational Reinforcement Learning for Classical Planning
Alan Fern, Sungwook Yoon and Robert Givan
Workshop on Relational Reinforcement Learning
Professional Activities
Co-Organizer
The Third International Probabilistic Planning Competition, 2011
Volunteer Organizer
The Second International Probabilistic Planning Competition, 2006
Tutorial Instructor
Learning Techniques in Planning at ICAPS07
Reviewer, PC Member
Computational Intelligence, Machine Learning, IEEE Transactions On Automatic Control, Journal of Artificial Intelligence, Artificial Intelligence, AAAI, IJCAI, ICML, ICAPS