I developed scalable web applications and analyzed real-world big data in both startup and industry environments.
Hi, my name is Diana Xu, and I got my CS Master degree from Cornell recently. I bring hands-on experience in developing web apps, analyzing real-world data and buildling data pipelines. I’m driven to solve complex problems with reliable, effective solutions and continuously seek opportunities to learn and grow. I’m looking for full-time roles in software engineering, particularly in frontend, full-stack, web development, or data analysis, but also open to other related positions!
JavaScript, Python, C++, React Js, Next Js, Node Js, Flask, RESTful API, HTML, CSS, Tailwind CSS, SQL, etc.
2024 Cornell MEng in Computer Science, 2023 UMich BS in Computer Science and Data Science
Including med-tracking portal, research data analysis tool, book-award ML prediction, etc.
Developed > 5 web applications from school projects to intern tasks
Had solid knowledge and practical experience with object-oriented programming, data structures and algorithms
Experienced cleaning and analyzing 10+ millions different-type raw data
Designed watch, phone, and web platforms with Figma
Med-tracking Portal | Startup Project | Jan 2024 - Present
This portfolio website is my way of showcasing my skills while diving into new challenges and learning along the way! I discovered how powerful and flexible Tailwind CSS is, using it to create fun animations and unique styles effortlessly. I also focused on making the site accessible and responsive, with features like a navigation bar that magically adapts to any screen size. This project has been such an exciting journey, blending creativity and problem-solving to build something intuitive, inclusive, and truly me!
Automated Pipeline for Data Insights | Intern Project | May 2022 - Aug 2022
Driven by my passion for big data and a desire to simplify complex processes, my two-term internship at Sumitovant Biopharma was an incredible opportunity to transform how data analysis is done. I developed an in-house tool in Python and R (connected with rpy2) to streamline network meta-analysis (NMA) on clinical data, ranking treatments across multiple studies. The tool tackled challenges like inconsistent treatment names using predictive similarity scoring with 95% accuracy and automated data transformation. It also integrates proper analysis model for different data types (binary, proportional, numerical, etc.) and delivers results within seconds, making data analysts’ work faster and more efficient.
This news search platform, built with React and Next.js, brings the latest New York Times articles right to fingertips through the NY Times API! Beyond just browsing, it lets users save their favorite articles and offers a seamless, personalized experience. When people are logged in, the platform—powered by Firebase for web analysis and Firestore for the non-relational database —remembers the search history, which users can edit anytime. It’s designed to make revisiting topics or rediscovering past interests effortless and enjoyable.
This is an Instagram clone project for the EECS485 course in Umich, the first course for me to learn how to build a web application. Creating this was an amazing learning experience to develop both my front-end and back-end skills. React was used to allow for client-side dynamic pages on the home screen of Instagram and also for individual posts. Python and Flask were used to construct REST API, Flask cookies were used to store usernames and to authenticate users, and SQL was used for database access and storage. The web app was finally deployed with an AWS EC2 instance.
Book-award ML Prediction | Research Project | Jan 2022 - Dec 2022
As part of the UMich Multidisciplinary Design Program sponsored by ProQuest, we developed a machine learning model to predict the likelihood of books winning awards, designed for deployment on Rialto, an academic marketplace for data-driven librarian decisions. Tackling challenges like 0.03% award-winning books in a 10M+ dataset and 80% missing data for numerous fields, we invested significant effort in merging datasets, handling missing values, and transforming raw data into usable insights. We analyzed 6 data sources and there were a total of 75 features, among which we selected or used to create 6 features for our final model. Through rigorous preprocessing, time-series analysis, and Random Forest Regression, we achieved a 31% match rate between predicted and actual top 100 books, highlighting the strength of our data preparation and modeling pipeline.