Objective
Education
M.S. in Computer Science
The University of Texas at Dallas | 08/2014-12/2016 | Dallas, USA.
Summary
Hands-on experiences with Web&Android Development, Big Data, Search Engine and Natural Language Processing
- Languages: Java, Python, SQL, R, PHP, and Scala
- Web&Android: Ruby On Rails, SpringBoot, RESTful, SDK, Git, Bootstrap, JavaScript, JQuery, MySQL
- Web : HTML, CSS, JavaScript, Query, DOM, JSON, XML, AJAX, PHP
- Big Data: Hadoop, MapReduce, HDFS, Docker, Spark(Streaming), Cassandra, Hive, Pig, Kafka, MongoDB, Tableau
Employments
Data Engineer Intern | Mount Sinai Health System, New York | Summer, 2015
- Load, transform and extract with large data sets of medical using R and Spark
- Designed and implemented a data pipeline which enabled batch analysis
- Made prediction for certain cancer medicine by training and testing gene mutation with machine learning classifiers such as logistic regression, SVM, decision trees etc.
Data Analyst
Ping An Insurance Group | 04/2010-08/2014 | Shanghai, China
- Implemented internal data processing tools, which were used by business team to do financial ETL
- Back end development, designed the system integration interfaces across platform, implemented system enhancement and developed new features to existing system
- Front end development, used Html/CSS, JQuery(Ajax) to dynamically display Landing page, searching page, profile creation page etc.
Projects
Big Data Analytic and Recommendation System (Hadoop, Pig, Hive, Cassandra, Spark and Mahout)
- Implemented Chaining of Map Reduce job along with both in memory and Reduce side join. Achieved desired output using secondary sorting and custom partitioning in MapReduce Job on HDFS
- Implemented various complex Pig Latin,Hive Cassandra queries to gain insightful analytics of IMDB movie database
- Developed different User Defined Functions (UDF) in Pig and Hive to filter data based on various constraints
- Recommendation based on implicit collaborative filtering via Spark, Scala and MLlib
- Developed a movie recommendation system using mahout’s Item Similarity matrix on IMDB movie data
Sentiment Analysis on Twitter (Natural Language Processing)
- Acquired linguistic data from Twitter through tweeter API using Python
- Analyzed user mood status using natural language processing techniques
- Located the twits on the Google map via their tweets through Google API using Python
Search Engine
Build Web-based search engine for a collection of lyrics crawled from 120000 webpages. Indexed the data using Lucene and N-Gram Model. Analyzed search result using Natural Language analysis tools. Optimized the HIT PageRank result by clustering in multiple machine learning methods