by Laura Dempsey and Tim Duggan — Originally published in the November/December 2016 issue of FMJ
Bare material, labor and equipment costs account for 79 percent of total construction costs. With the U.S. commercial construction starts estimated to be US$712 billion in 2016, this relates to a US$562 billion expense line item nationally this year alone. So there is a clear and evident need for diligent management of construction material and labor costs.
Commercial construction companies and real estate owners use commercially available construction cost data to obtain current material and labor cost estimates for new construction and renovation budgeting. They have depended upon economic forecast providers to support future cost budgeting and capital planning of these expenses. In the last decade, these stakeholders have increasingly come to realize that traditional forecasting methods do not meet their needs to predict costs at the local market or individual material level. This is not surprising. Traditional forecasting methods were never designed to provide robust forecasts at such a micro market level.
Due to this shortcoming, predictive data breaks from tradition with an alternative approach. The chosen methodology is a data mining approach. This family of processes and analyses has evolved from a mix of classical statistical principles, more contemporary computer science and machine learning methods, and engineering and decision science techniques. It is a robust methodology that takes advantage of recent increases in computer power, data visualization techniques, and updated statistical procedures to find patterns and determine causative relationships in data. Measures of these drivers and their relationships to each other and to construction costs, along with their associated lead or lag times, are then represented in a statistical algorithm that predicts future values for a defined material and location.
Recent construction cost research has shown this approach, known in academic literature as “multivariate causal modeling,” to be more accurate for predicting construction material cost. An additional strength of this approach is that it is capable of identifying turning points in construction cost before they occur. This is paramount — unequivocally what the industry is increasingly demanding at the same time that data mining approaches to decision-making and strategic planning have become more prevalent and expected in all industries.
Data and methodology
Best practice in data mining methodology involves the use of at least two datasets — training and validate datasets. Ideally, data in each dataset is unique; no overlap or reuse of data across the two datasets should be allowed. This is the data mining gold standard for developing accurate, robust algorithms that predict well for future time periods in ‘out-of-sample’ datasets, i.e., data previously unseen by the predictive algorithm.
Each dataset is composed of a “target” – the variable values to be predicted – and “features” (also known as “dimensions”), the set of input variables that have been selected as potential predictors. The training dataset consists of historical data for a given time period, and the validate dataset consists of a unique dataset of historical data for a different time period. In all cases commercially available construction cost data (either material or labor cost) is the target, and a set of general economic, construction industry-specific and material-specific indicators are features.
Data mining techniques require a minimum of 40-50 data points for acceptable accuracy. Chances are electronically recorded cost data was unavailable prior to the turn of the millennium. However, assuming a switch over to the digital field occurred around this time, a substantial amount of data should still have been recorded in the subsequent years. For instance, if you began recording electronically since 2002 and measure up to 2015, you will have yielded 12 year-over-year percent change data points, the metric chosen as best for modeling. These 12 data points could then be statistically “expanded” to quarterly values, resulting in 48 data points, sufficient for predictive modeling. (Expansion converts time series data values from one sampling interval or frequency to another and interpolates interim values.) This expanded dataset of 20022015 cost data would be the target in training datasets.
Subsequently, data from 1990-2001 could be manually extracted from past physically recorded sets and automated as the target for a validate dataset. That data could also be expanded to have 40 data points for validation of the algorithms resulting from the training efforts.
Data mining techniques could then be used to select the most promising economic, construction industry and material-specific indicators as features for predictive models. Then multivariate causal statistical modeling was used to develop a predictive algorithm based upon these indicators.
Consistent with data mining principles, multiple predictive analytic approaches should be selected and executed for this effort – auto regression, stepwise OLS (Ordinary Least Squares) regression, stepwise MLE (Maximum Likelihood Estimates) regression, ARIMA (AutoRegressive Integrated Moving Averages regressors), and VARMAX (Vector Autoregressive Moving Average with exogenous regressors). Candidate models should consist of 3 to 7 features. The final predictive algorithm will be the best model surviving training and validate phases from these five individual modeling efforts or an ensemble model which is a composite of best models.
Predictive data in use
Real estate (build to suit and tenant fit-out)
When real estate negotiations take place, several factors are considered. Site selection and cost of land reside at the top of this list. On the other hand, predictive data dives deeper, providing a view into the market for labor availability and material price increases that could support building in the identified location. Or conversely, providing insight that indicates the land is within budget, but the cost to build on the site could prove too costly for the project.
Furthermore, real estate departments often negotiate tenant fit-outs and build-to-suit projects. These negotiations include locking in the price of certain items like concrete for curbing and sidewalks up to two years in advance. If the price of concrete is more by the time the project begins, then the overall project is under budget before construction begins. The predictive data allows real estate divisions to evaluate markets for labor availability and accurately predicted material prices to negotiate better future contracts and increased ROI for long-term project success.
Design and construction
One of the big challenges for design and construction teams is managing the budget presented by either the architecture or contractor teams. More importantly, the D&C teams need to understand cost discrepancies to accurately predict future costs as time approaches to build and engage the contractors. All too often a mix of numbers come from the field and contractors for the price to build, and the big questions for any owner remain: Does it align with the predicted budget? What is the variance? Where are the differences?
Predictive data has been used by clients to more accurately predict the cost of new builds and renovations up to 36 months before the project breaks ground. The ability to have predictive data that accounts for real market conditions (amount of construction versus labor availability) and commodity price impacts on material prices has proved a critical insight in managing the budget from the design through to construction phases.
Facilities and maintenance
There are two main functions where predictive data has provided additional insight for the facilities department. The first is in providing a more accurate operating and maintenance budget upstream. The common practice is to provide a rolling three- to five-year budget plan where the facility manager is required to prioritize projects and estimate future costs for budgeting. What is missing for the FM is the ability to understand what market conditions will affect their budgets to effectively prioritize projects and predict budgets accurately.
While FMs typically have some form of life cycle data or condition assessment data to assist in the budgeting process, the ability to accurately predict future costs is problematic at best. Predictive data users have incorporated the predictive data to understand the impact of material prices on their budgets up to 36 months in advance, which allows them to adjust major projects. For example, an FM plans to replace a roof on several buildings — a plan projected at two years, but labor availability predictors indicate that 18 months is the ideal time for a more cost-effective project cost. The predictive data has allowed FMs to better forecast the budget needs, as well as time the market for project execution to maximize budget dollars. The end result allows the FM to use saved budget dollars for deferred maintenance backlogs.
Ultimately, the core value of using accurate predictive data at the material, labor and equipment level is the unprecedented ability afforded to owners and FMs to understand future costs of projects – FMJ
- Calculated from historical RSMeans data.
- “10 Construction Industry Trends to Watch in 2016,” Construction Dive News, Emily Peiffer, Jan. 4, 2016.
- “Analysis of Construction Cost Variations using Macroeconomic, Energy and Construction Market Variables,” Seyed Shahandashti, Dissertation at Georgia Institute of Technology, 2014.
- “Development of a Regression Model to Predict Preliminary Engineering Costs,” NCSU Dept. of Civil Engineering White Paper, Hollar, Arocho; Hummer, Liu and Rasdorf, 2010.
- “Empirical Tests for Identifying Leading Indicators of ENR Construction Cost Index,” Construction Management and Economics, Ashuri, Shahandashti and Lu, 2012.
- “Using Intelligent Techniques in Construction Project Cost Estimation,” Advances in Civil Engineering, Elfaki, Alatawi and Abushandi, 2014.
- “Volatility Forecast of Construction Cost Index Using General Autoregressive Conditional Heteroskedastic Method,” Journal of Construction Engineering and Management, Joukar and Nahmens, 2015.
NOTE: Ensemble modeling is the process of running two or more related but different analytical models and then synthesizing the results into a single model in order to improve the accuracy of the final algorithm. Data mining research shows ensemble models to often be more robust.
Tim Duggan is director of cost analytics with RSMeans, a Gordian company. In this role he advises clients in the corporate and institutional sectors. Recent projects include developing a forecasting model for predictive cost data, and designing an interactive dashboard for building product manufacturers to monitor and evaluate specification influence in the marketplace. Duggan earned a bachelor’s degree in mechanical engineering at New Jersey Institute of Technology and began his career as an engineer with the U.S. Army Research and Development Command.
Laura Dempsey, MBA, is director of consulting and analytics for RSMeans of the Gordian Group. She has been consulting for more than 15 years with Fortune 500 and U.S. state and federal agencies on the use of data, life cycle costs, models and predictive analytics in construction, capital planning and operations and management. Some of her clients include: Firestone, Georgia Pacific, EMA, the U.S. Department of Energy, Wells Fargo, Taco Bell and Tim Hortons. Dempsey holds bachelor’s degrees in both philosophy and biology from Agnes Scott College, as well as a master’s of business administration from Lake Forest Graduate School of Management.