DAO 2014-2010 Inductees

  • [Back to DAO Page of Fame]

    Erzinger Photo

    Dupont image

    Martin Erzinger

    Company: DuPont Pioneer

    With the help of company statisticians, Martin built a LOGISITC REGRESSION model to predict which potential customers (owner/operators of large US corn and soybean farms) would be most interested in Dupont Pioneer’s new precision agriculture software package, called Encirca℠ Yield. Encirca℠ Yield allowed farmers to use nitrogen (the second largest cost beyond the seeds themselves) more efficiently, and Martin’s model allowed Pioneer to deploy a new statistical ranking dashboard to its salesforce across the 12 major US corn states.

    Martin worked as a Marketing Consulting for DuPont Pioneer, the largest U.S. producer of hybrid seeds and solutions for agriculture. Given the importance of nitrogen as an input (difficult to manage & expensive) as well as the software’s ability to save farmers significant input costs (up to $100 per acre), the July 14 launch was crucial to the overall success of the Encirca℠ platform.

    In order to build a statistical model, Martin used newly available sales data from a previous software offering as a dependent variable and worked with the Pioneer statistics team to set up a logistical regression using internal sales and acreage data as candidate explanatory variables. Three primary purchase predictors were identified, which formed the basis for weighting the relative attractiveness of the customer from 0 to 1. Based on these scores, customers were divided into sales territories and ranked.

    To ensure the implementation of the ranking system at the salesperson level, Martin created a customizable Excel dashboard that listed all customers in a territory along with their scores and other relevant demographic and sales data. Leveraging strong buy in among the Encirca℠ team, the statistical ranking dashboard was delivered to sales territories across the 12 major corn-producing states.  

    2014-07-20 19.49.29

    Sasha Friedman

    Company: A Large Logistics Company

    Sasha Friedman developed a set of variables and weight-and-rate method for identifying the global shipping lanes for which multi-modal freight services (those that use some combination of ocean and air) were most promising for the company to explore and develop.

    Given the limited market information on multi-modal use, Sasha’s analysis of existing single-mode industry data was used to predict a downshift from expensive air freight to slower but less expensive forms of transportation such as ocean.

    She used pivot tables to separate these data on a lane and industry segment basis to identify potential indicators of multi-modal use. Using industry data over several years, she created a series of variables to capture these trends toward multi-modal usage.

    The variables were identified based on their correlation with overall conditions that signal the use of multi-modal shipping. Sasha then assigned weights to these variables based on the strength of that correlation to multi-modal usage in each lane. These weights were assigned using a combination of market data, industry insight and internal statistics.

    For the lanes identified as most promising in the model, Sasha recommended additional research to investigate potential product offerings, cost structures and revenue opportunities to move forward in multi-modal expansion.

      Vaibhav Image

    Vaibhav Laddha

    Company: A Large E-tailer

    Vaibhav used regression analysis to create a more effective metric for measuring the performance of the firm's hourly associates. Because the improved metric more correctly measures performance, it will eliminate the acknowledged game playing among the associates and lead to better promotion and termination decisions.

    Although there were clearly two dimensions to the work each hourly associate accomplished during a shift, only the more important of the two was used to measure performance. The challenge was how to acknowledge both dimension and create a better performance metric that was easy to understand, accept, and implement. Starting with a simple multiple regression, the coefficients of the two dimensions were used to create an improved unit of work that was a properly weighted combination of the two dimensions. Using these weights, the new metric is the ratio of actual hours to benchmark hours given the amount of work accomplished along the two dimensions. The new metric was enthusiastically endorsed by supervisors, in large part, because they expect that it will eliminate the game playing observed among the hourly associates. Because only work along the first dimension was being tracked, whenever possible savvy associates self-selected jobs with relatively low components of work along the second dimension—thereby artificially improving their performance results. 

      mollengarden photo

    Bain and Company Logo

    Click for larger view

    Jason Mollengarden

    Company: Bain and Company

    Jason used regression analysis to help a large manufacturing company understand the predictors of the profitability of its individual customers. The regression residuals identified customers deserving special attention, and the model will also be used to guide future customer acquisition activities.

    Jason worked with the senior leadership team of a large manufacturing company. The client was experiencing large variability in the profitability of its individual customers. Jason executed a survey to gather data on customer characteristics and then mined company data to create metrics describing the individual customers. Jason then used regression analysis to understand the predictors of the profitability of the individual customers. The regression model was able to explain the majority of the variation and identified customers with large residuals deserving special attention. This enabled the team to put together a plan to address re-pricing customers where profits were below or above the expected profit per the model. It then aided in guiding a discussion with the client regarding what characteristics the model was not capturing. The analysis will also be used to guide future customer acquisition activities by aiding in a reexamination of the current sales pipeline and by focusing future prospecting efforts on customers with desired profitability characteristics.

      Rao Image

    RAO Logo

    Archana Rao

    Company: The Boston Consulting Group

    Archana used pivot tables to summarize/analyze 50 educational performance variables for 200+ countries over a 480 month period and group the countries into clusters. Within each cluster, she used regression analysis to uncover the determinants of literacy rates and created an interactive tool to be used by educational ministries around the world to benchmark and improve their educational policies. The tool will be introduced at the DAVOS 15 conference.

    The client for Archana’s project was a global economic non-profit organization. After identifying and collecting education infrastructure variables for 200+ countries using publicly available data like the UNESCO institute of statistics and the world bank, Archana used pivot tables to identify those countries with missing data for which linear trend analysis was used to fill the gaps. Subsequent regression analyses revealed the weights to apply to each variable to best describe country-to-country variation in youth and adult literacy rates and sort countries into quartiles. In a final step, Archana’s organization commissioned 40+ education experts from around the world to rate each of 20+ detailed candidate solutions for each cluster. Those summary ratings become best practice recommendations for each cluster and were included in an interactive tool to be used by countries to guide future improvement efforts.

      Saxton Image

    Will Saxton

    Company: PricewaterhouseCoopers

    Will used pivot tables to identify hundreds of millions of dollars in potential savings for one of the country’s largest purchasers of household moving services. Those savings will accrue through the bundling of the near-simultaneous moves of multiple households between common pairs of cities.

    The concept is fairly straightforward. Rather the pay one moving firm to move a household from city A to B and a separate firm to move a second household from B to A, why not find one firm to provide both moves….at a better price? To begin to quantify the size of the opportunity, Will used pivot tables to summarize the results of approximately 350,000 moves (rows in the data set) the organization had purchased in the past. Despite the rather obvious opportunity to exercise its buying power (the 350K moves represented approximately 20% of all domestic and international moves), the client organization had been simply rotating its contracts among a long list of 500+ approved suppliers of moving services. With a capability to summarize the data by city-pair and time window, the next step was to estimate the amount of potential savings for the more popular city pairs. This estimation required Will to understand the existing tactics and economics of cross-country moves and impose realistic assumptions about the lower costs incurred with the bundling of contracts. The “hundreds of millions in potential savings” was the highlight of Will’s team’s end-of-summer report which gained approval from key client stakeholders for future implementation

    Andrew KritzerAndrew-Kritzer

    Company: LinkedIn (Mountain View, CA)
    Category: Data visualization, Regression

    LinkedIn is using the power of big data to connect talented individuals with economic opportunity. Today, the largest portion of LinkedIn's revenue comes from a flagship Software-as-a-service (SaaS) product designed to help recruiters identify and recruit candidates. The software offers advanced search, time-saving contact, and collaboration features above and beyond ordinary premium subscriptions. Andrew interned within LinkedIn's Business Operations team analyzing usage patterns to support product changes and feature development within this flagship Recruiter product.

    In order to start the analysis, Andrew first worked with a team to compile a dataset of over 5 million records summarizing June 2013 feature usage for tens of thousands of users. After categorizing detailed features into high level feature sets, Andrew created an interactive dashboard of histograms showing the percent of users using each feature set. The dashboard has parameters for changing the histogram bin size in order to understand features that were frequently used as well as those that were infrequently used. The analysis showed some surprising results and helped drive a number of suggested product changes, with one change accepted on the spot to improve usability.

    The second part of this project was comparing feature usage to Net Promoter Score (NPS). NPS is the likelihood that a customer will recommend the product to a colleague, a strong indication of customer satisfaction.  Andrew filtered the population for those with NPS, used a goodness of fit test to determine if the sample was representative and converted the feature usage to binary variables for each user. Next, he ran a stepwise linear regression to pinpoint which features correlated with higher (and lower) NPS scores. A few major features had statistically significant relationships, leading to discussions of customer education for positively contributing features and development for negatively contributing features.

    Chris LawrenceChris-Lawrence

    Company: Cushman & Wakefield (Washington, DC and Arlington, VA
    Categories: Pivot tables, time series models, multiple regression

    Chris worked in the Corporate Finance group at Cushman & Wakefield, a real estate services firm with a customer base ranging from small businesses to Fortune 500 companies. The Corporate Finance group provides consulting services for global corporations that need help optimizing their real estate portfolios and deciding when, where and how to change their real estate footprint. This decision process can be complex, as real estate changes must be planned far in advance due to construction, relocation, etc.

    Chris was tasked with identifying changes and trends in real estate use in the technology sector, with the goal of helping some of the company's larger clients make more educated real estate decisions. To this end, Chris collected detailed real estate portfolio and financial data for 25-30 representative companies, mostly from the services, hardware and software segments of the technology industry, covering a 7-year timespan. Metrics included financial statement items and data on square footage, employee headcount, facility numbers, and type of facility.

    First, Chris used Pivot Tables/Pivot Charts and time series analysis to compare and analyze the data in intuitive, flexible ways that his managers and their clients could process quickly. This allowed the audience to see, for example, how certain metrics were changing over time by company, by sector, and across the industry as a whole. Next, he aggregated all of the data in one large spreadsheet and used dummy variables to identify each line of data by sector, company, year, revenue category, and other items. Using regression analysis, he identified factors that correlate with expansion or reduction in square footage levels and employee numbers.

    Chris's analysis provided data-driven evidence for various hypotheses that helped the company advise clients more effectively. His analysis was included in client presentations, and he was awarded "Best Presentation" for the 2013 intern class for his work on this project.

    Nathan (Nate) WeirNathan-Weir

    Company: The Washington Post (Washington, DC)
    Categories: Pivot tables, data visualization (leading to risk analysis)

    Nate's job this summer was to analyze a project that The Washington Post was contemplating launching into a new business. To understand this business, the first portion of the analysis was to determine the market size. He used pivot tables and data slices to summarize unstructured industry data (over 40k rows) about different markets. He took this information and divided it into strategic subgroups for market sizing, then used it as a constraint in the financial modeling. Second, he created a number of interactive charts that helped illustrate how different variables (such as staffing, sales productivity etc.) affected key performance indicators of this specific new business (such as Customer Lifetime Value and compensation). Finally, he performed the risk analysis around various model inputs to find opportunities to reduce risk. By designing proactive downstream decisions that would change staffing and sales efforts based on how the business reached its growth targets, the worst case became much more appealing. Because of this, this project became more viable going forward.  Ultimately, his efforts led to clear understanding of the potential of this new business and drivers of its success.

    Liz TangLiz Tang Image

    Company: Target (Minneapolis, MN) 
    Category: regression, statistical inference

    Target’s ability to value an entire market of current and future stores has become more coveted as big box retail stores saturate the United States.  Being able to value an entire market could help Target plan it prioritize markets and plan its build out strategy within each market.  Liz’s first goal was to identify how much sales Target could get from a market if they built out to saturation.

    To work on this, Liz first summarized a great amount of data from Target’s Market Research and Analytics teams to understand what metrics Target currently used to evaluate stores. Next, to normalize for market population and different levels of income, she identified the percentage of a market’s income Target was capturing as sales in the current stores (share of income).  Liz then ran numerous multiple regressions to determine which metrics were most predictive of the share of income factor.  Two of the factors identified in the regressions were store density and share of general merchandising square footage. She segmented the data for the approximately 160 markets Target has more than one store in into four categories using high/low store density and share of space.  To verify these categories were statistically unique, Liz ran two sample t-tests assuming equal variance.  They were indeed unique, so she used two dummy variables for these categories to see how predictive this categorization alone was on the share of income Target attained.  The adjusted R-squared for this regression indicated there were many other variables that would help predict the share of income received, but individual variable statistics showed that the categorization was statistically significant.

    To summarize her findings, Liz used the 75th percentile market within each category (based on share of income) to benchmark other markets within that category against.  Scaling up for population and income in the market, she identified potential sales Target could obtain in every market.  This potential sales number gave Target an idea of how close they were to market saturation.

    Saul WallerSaul Waller Image

    Company: Tough Mudder (Brooklyn, NY)
    Category: logistic regression

    Tough Mudder is a company that puts on military style obstacle course endurance events, a video on the company’s website front page [www.toughmudder.com] shows what these events are. Started only in 2010 by 2012 it had almost half-a-million participants (customers) attending its events, and projected revenue around $70M.

    Saul’s job was to analyze Tough Mudder’s customer base. He first organized their customer records. Then he analyzed some basic KPIs with respect to customer characteristics (demographics, purchasing history, etc.) and built customer profiles. Finally, he set up a number of logistic regression models that estimated the probabilities of certain customers to engage in certain behaviors: for example, which customers are most likely to attend a second event, which event participants are most likely to recruit their friends, which participants are most likely to register for an event 9 months before, 6 months before, 1 month before, and many others.

    This led to recommendations about who to target with advertising and when, ultimately allowing the company to cut a good amount of advertising spend simply from not targeting the audiences that were unlikely to convert during a particular time period. Additionally, Saul recommended a number of new, potentially very high ROI marketing tactics based on the identified customer behavior patterns.

    Saul’s work over this summer that had a big impact on the company, helping the company become more data driven and better understand customer behavior.

    Luis ArrascueLuis Arrascue Image

    Company: Interbank (Lima, Peru)
    Category: multiple regression, histograms

    Interbank is the fourth-largest bank in Peru focusing primarily on retail banking with some 250 branches all over the country.

    Luis’s job this summer was to help Interbank understand how profitable a given branch could be given its location, area demographics, size, opening hours and other characteristics of the local business environment. As a first step Luis aggregated the necessary information about the branches profitability and the underlying environment characteristics in one convenient data set. Then he built a multiple regression model that predicted branch’s profitability based the characteristics of its local business environment. This model allowed identifying “outliers” in either direction, suggesting candidate branches for more detailed analysis of the business practices, and ultimately identifying best practices and areas for further improvement.

    When presenting this work to top level executives Luis was able to efficiently use histograms for presenting the variability in the branches’ performance and environment characteristics. By seeing both the benchmark from the model and dispersion in characteristics, Luis was able to suggest where resources should be focused and how the unprofitable branches can be made profitable. Some of the results Luis presented were a revelation for the top management as they broke certain existing preconceptions about what drives the performance of different kinds of branches.

    Ryan PoppRyan Popp Image

    Company: LivingSocial (Washington, DC)
    Category: regression

    In just under three years, LivingSocial has run around 85,000 deals across more than 300 US and Canadian markets.  Traditionally, Sales has relied on 50% commission as a starting point when inking new merchants.  Furthermore, they’ve depended only on the revenue side of deals (or “net revenue” which is their actual take from the deal) and have neglected the costs.  Without considering the cost side of the equation, LivingSocial as a company cannot understand deal profitability.  Furthermore, what commission rate should a Sales rep aim for?  Can we provide operations – the people responsible for accepting and scheduling deals – additional data about the profitability of deals? 

    With close to 80,000 deals’ worth of data, Ryan first computed the Net Contribution after Cost, or profit.  Revenues were easy.  Costs, generally, included operational and promotional costs as well as credit card fees and refunds – none of which were easy to calculate given the database structure.  Furthermore, LivingSocial had scant data on the actual number of recipients for each e-mail blast, affecting any calculation of per user figures.  Given a deal’s “list size” (the number of e-mails sent for the deal), Ryan calculated the net contribution after cost for every deal in the US and Canada.  Then, he ran log-regressions for net contributions for each market, which allowed him to compare the residual (unexplained) contribution of a deal to that of any other deal regardless of market/concept.  LivingSocial tags all deals with a concept – massage, for instance – so while we know an “average” deal in a market given an e-mail list size, we can also look at the relative performance of Massage versus Lasik in a given market.  Lasik, as it turned out, had relatively high refund rates, which impacted profitability.  But, Lasik deals have historically low commission rates.  Therefore, LivingSocial could make some adjustments (based on averages of the actual residuals) when looking at market/concept pairs.

    To make it easier for the Sales and Operations force to utilize, Ryan used the regression analysis to forecast expected profit given a list size for a market/concept in combination with the historical commission rate to ultimately determine what commission rate was required to achieve an “average” deal for a given market/concept pair.  So, if Massage has traditionally outperformed other concepts in DC and has traditionally been sold at 38%, it could be sold at 32% commission and still yield a deal with average profitability.  On the flip side, under-performing concepts need to be sold at higher-than-normal commission rates to achieve average profitability – an input of a high value to the Sales and Operations teams.  Additionally, Ryan calculated the commissions required to simply breakeven:  the critical “Do Not Run” points below which Operations should never go.

    In sum, given the imperfect deal data and a handful of assumptions, Ryan was able to utilize regression analysis to better equip LivingSocial’s Sales and Operations units to set optimal commission rates and drive more profitable deals for the company.

    Jennifer McGeeJennifer McGee

    Company: A.T. Kearney (Washington, DC)
    Category: Pivot tables, regression

    Jennifer worked in a strategic sourcing role with a global food and beverage client.  She evaluated two different initiatives for the client to determine which could have the greater financial impact on 2012 cost savings.

    The first initiative was investigating maintenance and repair spending.  The client had a large amount of data from a number of different sources and required PowerPivot in order to combine all the information into one dataset.  PowerPivot is a free add-in that enhances the capabilities of PivotTable and allows linking data from multiple sources through a common identifier.  Once she linked the data Jennifer was able to use pivot tables to sort the data by type of repair and analyze the difference in maintenance costs by location to determine which repair shops were providing consistent, low-cost service.

    The second initiative was estimating the size of an opportunity to improve auction sales for delivery vehicles.  Jennifer analyzed the client’s historical data to understand correlations between the selling price and the characteristics of the vehicle such as make, model, year, and mileage, as well as auction characteristics such as date, location, and floor price.  Using multiple regression analysis, she identified opportunities for increasing auction sales by moving vehicles to a different location and maintaining a consistent floor price based on certain vehicle characteristics.

    Tripper DicksonTripper Dickson

    Company: Norfolk Southern
    Category: Optimization

    As a necessary part of shipping intermodal freight, Norfolk Southern’s (NS) customers reposition their empty containers around the country in order to move them to markets where the freight is reloaded for shipment. Customers reposition these empties without regard to NS cost structure.  In order to capture the loaded freight business, NS was shipping such empty containers at special low rates and was incurring significant losses on this aspect of the business.

    To address this problem, Tripper created an optimization model that maximized NS’s margins by optimally routing the empty containers in such a way that clients’ requirements are satisfied at no increase in the cost to clients. He started with NS’s three largest clients: each week the clients send in their empty distribution plan for the following week – terminals they plan on using and volumes by lane.  Within five minutes (utilizing VBA macros) NS sends them an optimized distribution plan with suggested lane changes.  Their planning and cost is unchanged and NS margin is maximized.  By influencing their forecasted plan, NS was able to immediately capture these gains going forward.

    The model was implemented for the last 4 weeks of Tripper’s internship and immediate margin improvements of $65,000 were recognized.  He then trained NS employees to run the model and it will continue to be used on a weekly basis to influence the distribution of virtually all empty containers moving on Norfolk Southern railways. It is estimated that the total annualized margin improvement from implementing the Tripper’s model will exceed $3 million.

    Paul RhynardPaul Rhynard

    Company: McKinsey & Co
    Category: Optimization with binary variables

    In periods of excess market capacity, Paul’s client did not properly utilize its six production facilities: all facilities continued to run at slower rates regardless of the financial impact and without considering alternate asset optimization strategies.

    Paul built a Solver model that determined the optimal operating rate for each facility under various market conditions.  Given the fixed and variable cost structures of the different facilities, some were better off running at full capacity while others should be slowed or shut down, minimizing costs and maximizing profitability.  The model optimized the production decisions (modeled continuous variables) and the facility closure decisions (modeled as binary variables) utilizing the facility costs, supply, and demand inputs out to 2020.

    The model resulted in significant benefits to the client: given the flexibility of the inputs (forecasts, costs, shutdown costs, pricing, etc.) and the speed at which it calculates (under 1 second), the management considered multiple market scenarios and set on a robust course of action. Best of all, the model enables the operations and finance departments of a $2 billion business unit to flex their operations to save millions (potentially hundreds of millions) of dollars a year in excess supply situations.  It also helped to guide future capital expenditure decisions as it brought to light how important the supply/demand balance really is to the organization.

    Ryan McCollumRyan McCollum

    Company: Accenture
    Category: Time series forecasting, regression, pivot tables

     Ryan worked for Accenture in their Supply Chain Management Consulting practice where his client was a major cable provider in the US. Ryan’s task was to improve client’s forecast accuracy for cable boxes: low forecast accuracy has been driving excess equipment purchases and unnecessary inventory levels.

    Ryan analyzed the client’s historical demand data, to understand correlations between the demand for different cable box types and technologies and ultimately built a model for the client to use when creating their forecasts.  His analysis included taking the large amount of data, using pivot tables to sort the data into meaningful and useable data segments, regression analysis to understand correlations between product types, and ultimately the development of time-series forecasting models using ARIMA methodologies.

    Keeping a portion of the data in the hold-out validation sample, Ryan established that his model improved forecast accuracy from the client’s historical 20-50% up to about 80%. Such improved forecast accuracy is estimated to result in some $30M reduction in excess inventory among other cost savings such as reduced storage and transportation costs.

    Whit WalkerWhit Walker

    Company: CNN.com
    Category: data reporting, pivot tables, optimization

    Whit worked for CNN.com’s Internal Strategy Group, focusing on Product Strategy & Partnerships. This group did not have an accurate picture of the value exchange between CNN.com and 18 strategic partners that provide content to the “highest margin” sections that are essential to CNN Digital’s long term growth.

    These content partnerships were initially structured in an ad hoc manner that focused on driving site traffic and sharing content, but didn’t initially consider the financial value derived from each. Whit created a quantitative way to analyze the performance of these partnerships, treating each as though it was a separate company.

    Whit used the available site statistics to establish a final relative value measurement that identified which partnerships were highest or lowest performing and also identified which ones needed restructuring.

    After quantifying the value exchange between partners, Whit developed a package of five “what-if” analysis models to help CNN further understand partner value creation mechanics, restructure deals, negotiate favorable deal terms and inform staffing decisions based upon the restructured agreements.

    Quoting CNN’s Digital Partnerships Manager, Chris Finan: “Whit has changed our approach to partnerships: taking us from a reactive stance to a more proactive and strategic approach… [His] model will help us determine the success and revenue implications for each of our partners across CNN.com.”

    Katherine Gordon MurphyKatherine Gordon Murphy

    Company: UPS
    Category: Hypothesis testing, Pivot Tables, Monte Carlo simulations

    Kate interned at UPS where she performed a strategic assessment of the international dangerous goods (IDG) shipping market. To date there is no central tracking of the movement of dangerous goods and UPS wanted to know the size, potential and its share in the IDG market. To estimate the market size, Kate used a global trade database and worked with experts to estimate what percentile range of each commodity was hazardous. She then ran a Monte Carlo simulation which produced risk profiles (histograms) of the complete IDG market size.

    After analyzing the external environment of international dangerous goods shipping, Kate looked at the role it plays internally for UPS. UPS views IDG as a strategic product that 1) leads to incremental non-niche shipping and 2) attracts clients that are more profitable than “average” UPS clients. Interestingly, these hypotheses had never been tested. Kate analyzed UPS’ practices related to IDG and discovered that a “natural experiment” occurred in the data between 2008 and 2010: some of the clients that were pre-approved for IDG shipping by UPS never used the service. Kate used pivot tables and hypothesis testing to compare the differences in shipping volumes and profitability between the pre-approved clients who shipped IDG and those who did not. While the results must remain confidential, Kate’s work led to very interesting conversations about the future of the IDG product offering and plans for its expansion.

    Ilya TeslenkoIlya Teslenko

    Company: Navy Federal Credit Union
    Category: Time Series Analysis, Linear regression, Optimization

     Over the summer Ilya worked in the Lending Department to improve the forecasting models of losses in consumer portfolio. The current approach was to first forecast the outstanding balance of the portfolio, and then use another model to forecast the loss as a percentage of the balance. This forecasted loss percentage was applied to forecasted portfolio balance to get the absolute loss number.

    Ilya first attempted to create a single model that would forecast the absolute loss, but that attempt was not successful. He therefore kept the general structure with two models unchanged.

    To forecast the outstanding balance he built separate models for the three main segment of the portfolio, Consumer loans, NAVCheck and Leasing, using time-series ARIMA models with auto-correlation, seasonality and trends. Aggregating the prediction of his models yielded a noticeable improvement in the quality of the forecast for outstanding balance. Interestingly, however, when Ilya applied the existing loss percentage model forecasts to his new outstanding balance prediction, the resulting forecast of the absolute loss became less accurate. As Ilya discovered, this was because the first existing model over-forecasted the balance, while the second under-estimated the loss percentage, so that when taken together they produced a forecast that was “just right.”

    Ilya therefore also redid the model that forecasted the loss percentage. The existing model for losses was weighting the past 3 years of historical data equally. Ilya used Solver to find optimal weights minimizing the sum of squared errors; as expected Solver gave more weight to the recent data. These weights were used for the modified model for forecasting the loss percentage.

    Ilya’s new model for outstanding balance combined with modified model for the loss percentage proved to produce a much better forecast in terms of squared errors than the current model in place. Ilya’s model was launched in test-run as a candidate to replace the current one; its performance will be evaluated at the end of the year.

    Brandon SmitBrandon Smit Resized

    Company: Amazon.com
    Categories: Linear Regression, Optimization

    Over the summer, Brandon Smit worked for the Shoes In-Stock Management team at Amazon.com on two projects: the Site Merchandise Forecasting Model and the Liquidation Model. 

     In the first project, he analyzed sales data for approximately 10,000 shoes and built four linear regression models to predict 20-week demand at the style-color-size-width level, one for each of the four demand segments. When tested on a holdout sample, his models outperformed those historically used by Amazon.com in terms of forecasting accuracy (as measured by Mean Absolute Percentage Error, or MAPE). Brandon’s models are currently used to inform procurement decisions.  

    In the second project, Brandon analyzed the economics of a liquidation decision. He built a model to determine at what point an item should be withdrawn from sale and sold to liquidators to recoup a fraction of the procurement cost. The model used a VBA macro that performed a Solver optimization across thousands of unique products to calculate the breakeven period demand and number of units to liquidate.  

    Brandon’s recommendations led to the liquidation of nearly 100,000 shoes; that freed up about 5% of warehouse capacity and provided capital to invest in products for the upcoming holiday season.

    Charles (Chip) HoggeChip Hogge

    Company: Deloitte
    Categories: Pivot Tables, Optimization

     Chip Hogge worked with the domestic expatriate operations of a leading health insurance provider. The client did not have an accurate picture of its organizational processes or how resources were allocated across 15 business units and 10 global locations. In order to transition to a more competitive cost structure without sacrificing service levels, the client needed to determine ”who is where” and ”who does what” to facilitate an organizational and business process redesign. The original employee-level data were stored in an Excel file and tagged with cost center information. Chip used iterative lookup functions to assign process categorizations and outsourcing priority to each employee. Several pivot tables were then used to organize the data and provide different views of the overall organization, including by line of business and process, location and employee type. Chip then developed an optimization model that minimized compensation spending based on numerous constraints established through interviews with Business Unit leaders across the organization. The client used this information and recommendations to initiate vendor discussions and develop a reorganization plan consistent with its strategic priorities. These initiatives had an immediate impact on profitability by reducing compensation spending by 30% and total SG&A spending by 10%.

    Stefan TalmanStefan Talman

    Company: Delta Airlines
    Category: Regression

    Stefan Talman worked in Delta Airlines’ Revenue Management department. Because its existing inventory demand forecasting system was based on historical data, it couldn't capture/plan for the demand upswings for new large events. For example, because the Super Bowl is in Honolulu (airport code HOU) this year, demand forecasting for flights to HOU based solely on historical data would be clearly inaccurate. Stefan used multiple regression to describe the change in revenue for the days before and after an event (accounting for day-of-week and seasonal effects) using a set of independent variables (event size, market share, airport size, event type, etc.) for some 150 events in various market types (e.g., hub and spoke) and event types.  The final model was based on event size adjusted by market share and airport size (both in seats/week). Stefan’s model was adopted by the inventory management team and will continue to be refined as new large events occur.

    Bryan LieberBryan Lieber

    Company: Target
    Categories: Pivot Tables

    Although Target had high-level chain-wide visibility into the future profitability of its Pharmacy business, it did not have a good understanding of how the Pharmacy P&L varied across each of its nearly 1,600 stores. Using pivot tables, Bryan Lieber analyzed historical prescription ("script") sales data by store to understand what store characteristics (store size, age, region/geography, etc.) had the biggest impact on sales growth. Ultimately, he developed a sales-maturity curve based on age for each of Target's four regions. This allowed him to create five years of annual sales projections for each of Target's approximately 1600 pharmacies. Bryan combined these projections with store-specific financial data and long-term industry trends to develop a model that projects annual profitability for each of the next five years on a per-store basis. He then segmented the store population into four classifications based on their profitability and growth levels: Stars, Cash Cows, Questions, and Dogs/Sub-Optimals. This segmentation enabled much more granular reporting about how each of these various segments contributed to chain profitability, particularly highlighting the large gap between the best- and worst-performing stores. Finally, Bryan reviewed the time-of-day purchasing patterns for the "Dog/Sub-Optimal" stores to identify candidates for the firm’s hour reduction initiative and estimated the annual EBIT improvement from this initiative. 

    Going forward, Target will use the Pharmacy Segmentation model in three ways. First, segments can be incorporated into monthly reporting to provide additional visibility into Pharmacy profitability. Second, the model will be used to refine the target list of stores for hour reduction programs and calculate potential savings. Third, the Pharmacy team will use the segmentation to be more selective when rolling out new strategic initiatives. Rather than deploying new strategies to all stores in a given geography, with the help of Bryan’s model, they can target the type(s) of stores in the areas that best match their objectives.

    Geoffrey CoffieldGeoff Coffield

    Company: BCG
    Categories: Data Reporting, Pivot Tables

    Geoff Coffield worked with the manufacturing IT group of a large pharmaceutical company on two projects. In the first project, he was tasked with figuring out how and where money was being spent on contract employees and identifying opportunities for savings. From the company’s disorganized data, he created large pivot tables and charts using translation matrices. From the results, he was able to determine how, where, and by whom money was being spent on contract employees and to a certain extent what the employees were doing. Based on that analysis, he identified an addressable opportunity of about $18 million in annual spending and developed an approach to realize these savings. This realization process is currently under way and on track. 

    In his second project, Geoff supported a global reorganization affecting thousands of stakeholders around the world. He built a multidimensional employee survey to track progress across the different operating groups and geographies. He then created a comprehensive dashboard that allowed the client to analyze data across relevant dimensions and identify opportunities for improvement. In particular, because the client’s manager who will own this moving forward is not experienced with Excel, Geoff had to automate nearly every aspect of the process to the point that the manager need only copy and paste data into the dashboard template. Since then, the survey and the dashboard have been used twice to help identify a number of opportunities the company is currently pursuing.