Hi, my name is Margaret Reed
I am an MS Candidate in Data Science at Columbia Univeristy.

Learn more

About me

Profile Image
In Sweden!

In this day and age, data is more important than ever. However, good data is not very accessible to most ordinary folks, so I am a firm believer of well-designed, interactive data visualizations, that allow all people the ability to make their own data driven decisions.

I am pursuing my M.S. in Data Science at Columbia University.

I have experience with both software development and data science. I have also taken several classes pertaining to both, including classes on algorithms, machine learning, data science, bayesian statistics, linear algebra, and more. I have participated in several hackathons and datathons, including Duke's datafest where I won the best prize for visualization. My work experience includes summer internships with Amazon, Lenovo, and Apple.

Read more

Born and raised in Chapel Hill, NC, I have always enjoyed living by a university with all of the academics and opportunities it affords. Having attended Duke University, and now Columbia, I get to fully take advantage of the university perks. I enjoy doing research, teaching, and working on data projects that I am excited by. In undergrad, I worked with a research group that analyzed long term, whole ecosystem data from the ELA (Experimental Lakes Area) to best visualize and communicate the crucial environmental findings. The summer after my sophomore year, I interned at Amazon as a software development engineering intern. Naturally, I missed my coursework so much I had to double down and dip my toe into teaching. Starting my junior year, I was a teaching assistant for two undergraduate courses: STA 199 and CS 216. The summer after my junior year, I interned at Lenovo as a Data Visualization intern. However, once I graduated from Duke, I was still not done with academia, so I started a new journey at Columbia University where I am now currently pursuing my Master’s in Data Science. After my first year, I interned at Apple as a Data Engineering intern in sunny Cupertino. I am now in my final semester at Columbia, where, when I am not in class, I am working for admissions at the Columbia Data Science Institute. However, after five and a half years in higher education I am ready to join the workforce and try something new!

Other activities I pursued while in undergrad included participating in the summer program DTech as a scholar. Additionally, my junior year, I volunteered on a political campaign as a data analyst. Two other undergraduate students and I ran analyses for and strategized with the campaign team in order to best support them. I also volunteered on the communications and marketing working group with Bluebonnet Data, the organization I volunteered with in the fall. My senior year I volunteered for AccessiBull Healthcare as a social media / marketing assistant

Outside of academics, I love to travel; some of my favorite locations so far have been the UK, Costa Rica, Germany, and Sweden (pictured). I love music and have been playing the cello for over 10 years. I have kept it up in college by playing in the pit orchestra for musicals with the student-led musical theater group Hoof 'N' Horn.

View Resume

Contact me!

Projects

Redistricting

As part of a class project, I began investigating redistricting algorithms and playing around with my own.

Github repo

Abstract-ify

As part of a group project in my data visualization class, we created an R-Shiny app which extracts colors from, simplifies, and creates an outline for any image.

Github Repo

Bluebonnet Data

I volunteer as a data analyst with an organization that supports downballot political campaigns. Here is a blog post I wrote about my experience.

My blogpost

Job experience

Apple Data Engineering Intern (May - Aug. 2023)

As a Data Engineering Intern at Apple, I created a tool to predict compute warehouses for snowflake queries. I worked with ML models and APIs to build this tool into a python package in order to integrate it with existing pipelines on airflow. This tool has projected savings of millions of dollars, as its capability is not provided by Snowflake. In addition to my intern project, I also participated in Apple’s intern pitch competition iContest, where my team placed in the top 3 out of 150 intern groups for our idea.

Duke University UTA (Jan. 2021 - May 2022)

I worked as an Undergraduate Teaching Assistant in two classes: STA199 - Intro to Data Science and CS216 - Everything Data. I graded, held office hours and led lab sections.

Lenovo Data Visualization Intern (June - July 2021)

During my internship at Lenovo, I was a Data Visualization Intern for the Digital Transformations team. I supported migration from Qlik Sense to Microsoft Power BI and provided data visualization consulting for internal groups by running code reviews and hosting weekly training sessions. I successfully created a Power BI report with 30+ pages of fully functioning dashboards and best practices content. Alongside a team of interns, I designed a wirelessly charging backpack for the Lenovo Incubator Project. We won Best Overall Quality Award and Peers Choice Award for this project.

Duke University Math Department (Jan. - Dec. 2020)

I worked as tutor in the Duke Math Help room for Math 212 - Multivariable Calculus. I helped students with homework and questions regarding Multivariable calculus.

Duke University Admissions (July - Aug. 2019)

I worked as a tour guide, giving tours around Duke campus for prospective students. I gained experience with public speaking and interacting with a variety of different people.

Amazon SDE Intern (May - Aug. 2020)

As an SDE intern for Amazon, I learned and adapted to the team's architecture quickly. I developed an internal query tool, which allowed other members of my team to easily cross reference data from multiple databases. I owned both the frontend and backend aspects of this project, leveraging skills in Java, JS, and SQL for development. I worked primarily in Java, using the Spring MVC framework, manipulating JSON-nodes, and testing with Mockito unit-tests.

Clinical Tools Research Intern (June - Aug. 2017)

As a research intern, I researched opioid addiction and obesity as to figure out how to best educate medical professionals and their patients about fighting these problems. I created online tools to assist medical professionals in the treatment of their patients, such as VR games for teaching patients about healthy vs not healthy foods.

Undergraduate Course work

Computer Science

CS101 (Intro to Computer Science)

Introduction practices and principles of computer science and programming and their impact on and potential to change the world. Algorithmic, problem-solving, and programming techniques in domains such as art, data visualization, mathematics, natural and social sciences. Programming using high-level languages and design techniques emphasizing abstraction, encapsulation, and problem decomposition. Design, implementation, testing, and analysis of algorithms and programs.

CS201 (Data Structures and Algorithms)

Analysis, use, and design of data structures and algorithms using an object-oriented language like Java to solve computational problems. Emphasis on abstraction including interfaces and abstract data types for lists, trees, sets, tables/maps, and graphs. Implementation and evaluation of programming techniques including recursion. Intuitive and rigorous analysis of algorithms.

CS216 (Everything Data)

Study of data and its acquisition, integration, querying, analysis, and visualization. Concepts and computational tools for working with unstructured, semi-structured, and structured data and databases. Interdisciplinary perspectives of data and its impact crossing science, humanities, policy, and social science. Culminating team project applied to real datasets.

CS316 (Intro to Database Systems)

Databases and relational database management systems. Data modeling, database design theory, data definition and manipulation languages, storaging and indexing techniques, query processing and optimization, concurrency control and recovery, database programming interfaces. Current research issues including XML, web data management, data integration and dissemination, data mining. Hands-on programming projects and a term project.

CS330 (Design/Analysis of Algorithms)

Design and implementation of modern algorithms. Stresses application and project based development of algorithmic techniques. Emphasis on algorithmic ideas that have had substantial impact in the real world, including approximation, randomization, hashing, streaming, spectral techniques, optimization, and search. Project-driven: Several homework assignments as well as a larger student-driven course project researching, designing, and implementing algorithms for a substantive problem with real world applications.

CS333 (Algorithms in the Real World)

Introduction practices and principles of computer science and programming and their impact on and potential to change the world. Algorithmic, problem-solving, and programming techniques in domains such as art, data visualization, mathematics, natural and social sciences. Programming using high-level languages and design techniques emphasizing abstraction, encapsulation, and problem decomposition. Design, implementation, testing, and analysis of algorithms and programs.

CS371 (Elements of Machine Learning)

Fundamental concepts of supervised machine learning, with sample algorithms and applications. Focuses on how to think about machine learning problems and solutions, rather than on a systematic coverage of techniques. Serves as an introduction to the methods of machine learning.

Statistics

STA112 (Data Science)

Combines techniques from statistics, math, computer science, and social sciences, to learn how to use data to understand natural phenomena, explore patterns, model outcomes, and make predictions. Case studies include examples from election forecasts, movie reviews, and online dating match algorithms. Discussions around reproducibility, data sharing, data privacy will accompany these case studies. Gain experience in data wrangling and munging, exploratory data analysis, predictive modeling, and data visualization, and effective communication of results. Course will focus on R statistical computing language.

STA210 (Regression Analysis)

Extensive study of regression modeling. Multiple regression, weighted least squares, logistic regression, log-linear models, analysis of variance, model diagnostics and selection. Emphasis on applications. Examples drawn from a variety of fields.

STA313 (Advanced Data Visualization)

This course is all about the art and science of visualizing data. Learn about the what (types of visualizations, tools to produce them), the how (start with a design, pre-process the data, map it to graphical attributes, make strategic decisions about visual encoding, post-process for readability and visual appeal), and the why (the theory behind the grammar of graphics). Evaluate the clarity, effectiveness, and honesty of visualization choices and improve (your and others') visualizations through an iterative design process. Discuss the role of statistical graphics in modeling and inference. Do it all in R, reproducibly, and using a variety of modern data visualization packages.

STA323 (Statistical Computing)

A practical introduction to statistical programming focusing on the R programming language. Students will engage with the programming challenges inherent in the various stages of modern statistical analyses including everything from data collection/aggregation/cleaning to visualization and exploratory analysis to statistical model building and evaluation. This course places an emphasis on modern approaches/best practices for programming including: source control, collaborative coding, literate and reproducible programming, and distributed and multicore computing.

STA360 (Bayesian and Modern Statistics)

Principles of data analysis and advanced statistical modeling. Bayesian inference, prior and posterior distributions, multi-level models, model checking and selection, stochastic simulation by Markov Chain Monte Carlo.

STA432 (Stat. Learning and Inference)

Estimators and properties (efficiency, consistency, sufficiency); loss functions. Fisher information, asymptotic properties and distributions of estimators. Exponential families. Point and interval estimation, delta method. Neyman-Pearson lemma; likelihood ratio tests; multiple testing; design and the analysis of variance (ANOVA). High-dimensional data; statistical regularization and sparsity; penalty and prior formulations; model selection. Resampling methods; principal component analysis, mixture models.

Mathematics

MATH212 (Multivariable Calculus)

Partial differentiation, multiple integrals, and topics in differential and integral vector calculus, including Green's theorem, the divergence theorem, and Stokes's theorem.

MATH221 (Linear Algebra)

Systems of linear equations and elementary row operations, Euclidean n-space and subspaces, linear transformations and matrix representations, Gram-Schmidt orthogonalization process, determinants, eigenvectors and eigenvalues; applications. Introduction to proofs.

MATH230 (Probability)

Probability models, random variables with discrete and continuous distributions. Independence, joint distributions, conditional distributions. Expectations, functions of random variables, central limit theorem.

MATH356 (Differential Equations)

First and second order differential equations with applications; linear systems of differential equations; Fourier series and applications to partial differential equations. Additional topics may include stability, nonlinear systems, bifurcations, or numerical methods.

MATH431 (Real Analysis)

Algebraic and topological structure of the real number system; rigorous development of one-variable calculus including continuous, differentiable, and Riemann integrable functions and the Fundamental Theorem of Calculus; uniform convergence of a sequence of functions; contributions of Newton, Leibniz, Cauchy, Riemann, and Weierstrass.

MATH465 (Intro High Dim Data Analysis)

Geometry of high dimensional data sets. Linear dimension reduction, principal component analysis, kernel methods. Nonlinear dimension reduction, manifold models. Graphs. Random walks on graphs, diffusions, page rank. Clustering, classification and regression in high-dimensions. Sparsity. Computational aspects, randomized algorithms.

Contact