Data CollectionThe completion of Task 1, data collection, was critical to the remainder of the study; it provided the foundation needed to complete Tasks 2 and 3. The customer was the prime data supplier for the purpose of this study. The customer provided the Team with data recorded by each vessel’s port engineer. Because each vessel has its own port engineer, data inconsistencies were present. Three data categories, Monthly Consumption and Operating Hours, Power vs. Speed, and Vessel’s Ship log were provided to the Team (reference Appendix A for information on all the data types provided within each category). The Monthly Consumption and Operating Hours data contained approximately 700 rows of data (about 120 data points for each vessel), the ship log data contained more than 42,000 rows, and the power vs. speed table included six relationships between the power and speed of the vessel. During Task 2, the Team combed through the data and identified data anomalies. Data anomalies were presented to the customer for removal concurrence. The following data types were used during the study:
Data AnalysisTask 2 consisted of data manipulation, data analysis, and model development. The purpose of the mathematical model was to calculate the vessel fuel consumption based on speed and sea state. The main tools used during Task 2 were Microsoft Access, Excel, and Minitab. The model was created through Access; Excel was used for regression and statistical analysis and plotting data; and Minitab was used for statistical analysis. Outlier AnalysisThe collected data had two types of data recorded at different time intervals. The first was the monthly fuel consumption for underway (UW) and not-underway (NUW), with a single data entry indicating the monthly fuel consumption for each vessel by month. The second data type included the sea state and average vessel speed. The sea state and vessel speed had several data entries recorded multiple times a day for each vessel. Monthly Fuel ConsumptionPrior to any data analysis and model development, the Team manipulated the data to ensure consistency and preparation of the database. The first step to prepare the data for the mathematical model was to identify data outliers. In order to determine which mathematical approach to use, the Team first needed to determine if the data was normally distributed. An Anderson-Darling normality test was performed to determine if the data was normally distributed. The Anderson-Darling indicated that the data was not normally distributed for all vessels; the p-value was less than 0.05. Since the data was determined to be not normally distributed, the Team identified data outliers by means of boxplots and fences. Boxplots are dependent on the median, rather than the average of the data, and is able to handle non-normal data. Boxplots were created for each combination of operating mode (UW and NUW) and vessels, for a total combination of 12 boxplots. Fences, which aided in identifying outliers, were calculated to be 1.5 times the difference of the inner quartiles; equations are below.
Thus anomalies are 150% less than the first quartile or 150% greater than the third quartile. If the calculated lower fence was negative, a value of zero was used instead. The following figures show the boxplots for UW and NUW fuel consumption.
Figure 3. Boxplots for UW Fuel Consumption (gal/hr)
Figure 4. Boxplot for NUW Fuel Consumption (gal/hr) The following table identifies the lower upper range of fuel consumption (gal/hr) data to be considered for each vessel during UW. Table 1. Ranges of Consideration for UW Fuel Consumption
These values were used to determine outliers in the monthly fuel consumption data. Any data points outside of these ranges were not used in the mathematical model. The table below identifies the lower upper range of fuel consumption (gal/hr) data to be considered for each vessel during NUW. Table 2. Ranges of Considerations for NUW Fuel Consumption
Unlike the fuel consumption range for UW, the NUW had a larger amount of lower ranges that were calculated to be negative and therefore a value of zero was used. Similar to the UW data, these values were used to determine the monthly fuel consumption outliers. For the NUW outliers, since it was assumed that the hotel load was not affected by the modification, the Team was able to “estimate” what the month’s average fuel consumption should have been. This estimate was developed through a time series moving average of like months; this means that for any given year, if a month’s (i.e., March) fuel consumption was identified to be an outlier, all like months for all other years fuel consumption was averaged (i.e., averaged all the fuel consumptions for the month of March across all years (all years weighted equally) excluding the March value that was deemed an outlier). This allows the estimated hotel load to take into consideration any seasonal effects. The table below identifies the estimated hotel load for each vessel by month. Table 3. Estimated Hotel Load (gal/hr)
The estimated hotel load replaced the outliers identified in NUW. The process used to calculate the estimated hotel load and the results were approved by the customer prior to incorporation. Monthly Data (other)The team also performed outlier analysis (by means of boxplots) on the vessel speed data. The table below contains the results of the outlier analysis. Table 4. Ranges of Consideration for Vessel Speed
Any calculated lower range values that were negative were replaced with a value of zero. Some of the calculated upper range speed values were determined to be too high for this type of vessel (“Initial Upper Range for Consideration” in the table above). Therefore, the customer suggested that the upper range for vessel speed not exceed 18 kts (“Final Upper Range for Consideration” in the table above). Typically, the T-AGS vessels have a maximum speed of 16 kts, but in perfect conditions (ideal wind speed/direction and ocean current), a maximum speed of 18knts could be achieved. The Team verified that low speeds are expected when the vessels are surveying the ocean. When appropriate, the average speed of 0 kts identifies that the vessel was in NUW mode. While the Pathfinder has a lower range of 0.5 kts, average speeds of 0 kts were still used to identify UW. Any recorded speed greater than 18 kts were removed from the mathematical model. The final outlier analysis performed was to extract monthly data for a month that did not have sufficient data. The Team determined that this was necessary to provide a realistic analysis on monthly data – analyzing months with 30 days of data to months with two days of data would not produce useful results. Therefore, the Team removed monthly data with less than 75 percent of its daily data. The percentage of daily data was based on the month’s actual number of days rather than averaging 30 days a month. The table below provides the results of this analysis. Table 5. Monthly Data Analysis
The data gathered in Task 1 contained eight years of data, for a total of 96 months. Each vessel had months completely excluded from the data (e.g., there was no data collected for that vessel during that timeframe), this data is recorded in the “Months with No Data” column. After removing the monthly data with less than 75 percent of the daily data, the Team counted the number of months that were removed (recorded in “Months Missing > 75% Data” column) and the total number of months that remained (“Usable Months” column) for model analysis. The customer recommended that the Team analyze the effect of removing monthly data with less 75 percent of the daily data. The analysis would determine the model sensitivity to exclusion of months missing sizable amounts of data. The Team calculated the total variability and average variability of removing months with less than 65, 75, and 85 percent of daily data. The total data variability was determined by calculating the sum squared and average data variability was calculated by the sample variance. Each analysis was performed on the recorded propulsion data. The figure below contains the results of the sum of squares analysis on the recorded propulsion data.
Figure 5. Sum of Squares for Recorded Propulsion Fuel Consumption The results were as expected, utilizing month with more data provided a smaller total variance. The Bowditch vessel had the least data variability amongst all the vessels while the Sumner had the greatest data variability. Since each vessel did not have the same number of months included in the analysis, the Team determined that the total variability may not appropriate analysis: the variable of months could influence the total variability.^{[1]} The Team determined the average data variability for each vessel. The figure below contains the results of analysis on the recorded propulsion fuel consumption.
Figure 6. Sample Variance for Recorded Propulsion Fuel Consumption The average data variability provided an inverse bell curve and appears to provide the analysis the Team wanted. The average data variability allows the Team to determine, on average, which percentage of monthly data required for analysis was lower. On average, the 75 percent monthly data required had a similar or less than average data variability that the 65 and 85 percent. Next, the Team determined the effect the percentage of month data requirement had on the number of months available for analysis; reference the table below for results. Table 6. Daily Percentage Requirement Effect on Usable Months
Reducing the required daily data requirement from 75% to 65% had a large impact on quantity of monthly data for the Henson and Mary Sears vessels, 42 and 22 percent, respectively. All other vessels experienced less than a 10% decrease in the number of usable months when the daily data requirement was 75 percent instead of 65 percent. However, all vessels were impacted by increasing the daily data requirement from 75 to 85 percent. The increase from 75 to 85 percent resulted in a large decrease in the number of usable months. The Henson experienced the largest decrease in usable month (43 percent); however, the number of usable months with the 75 percent requirement was seven which was lowered to four with the 85 percent requirement. To complete the sensitivity analysis, the Team determined how the skeg modifications affected fuel consumption on the recorded propulsion fuel consumption means for the 65, 75,^{[2]} and 85 percent daily data requirement. Those results are contained in the tables below. Table 7. 65% Daily Data Requirement
Table 8. 75% Daily Data Requirement
Table 9. 85% Daily Data Requirement
Although the change in daily data requirement did result in changes to the overall average of fuel consumption for each vessel, the 85 percent daily data requirement resulted in the Henson vessel having only one data point for certain data categories. With the exception of two vessels, the number of months excluded when using the 75 percent daily data requirement vs. the 65 percent was minimal. Also, the average data variance indicated that the 75 percent daily data requirement was similar to or less than the 65 and 85 percent daily data requirement average variance. With the results of this analysis, the Team was confident in using the 75 percent of monthly data required for model development. Outlier Analysis ResultsAt the conclusion of the data manipulation, approximately 0.37 percent (132 of 35,534), 5.97 percent (44 of 737), and 19.95 percent (147 of 737) of the vessel speed, UW, and NUW data, respectively, were determined to be outliers. Approximately 50 percent or more of the monthly analysis was removed for each vessel due to lack of data. Sensitivity analysis indicated that utilizing 75 percent for determining which monthly data needed to be removed was appropriate; the calculated average variability for 75 percent was equal to or less than the average variability for 65 and 85 percent. Although the Team analyzed and removed identified outliers, the data may still contain outliers which could affect the results of the mathematical model. If outliers remain in the data, potential risks such as unrealistic results could arise. [1] For example, if comparing the total fuel consumed on similar vehicles that are operated for different amounts of time, the vehicle that is in operation longer would mostly likely have a higher fuel usage. To accurately compare the total fuel used, it would be better analyzed at the average gallons per hour. The analysis would provide a better understanding of which car has better fuel consumption. [2] It should be noted that the Recorded Data Analysis section will go into great detail on the analysis performed on the 75 percent daily data requirement for recorded UW and propulsion data. |