A video-based tracking system for football player analysis using Efficient Convolution Operators Nguyen Hong Thinh1, Hoang Hong Son1,2, Chu Thi Phuong Dzung1, Vu Quang Dzung3, and Luu Ma
Trang 1A video-based tracking system for football player analysis using Efficient Convolution Operators
Nguyen Hong Thinh1, Hoang Hong Son1,2, Chu Thi Phuong Dzung1, Vu Quang Dzung3, and Luu Manh Ha1,2∗
Abstract—Computer vision has been applied in sports analysis
under the demand of the media as well as a training activity
This paper presents work on a system for tracking multiple
football players in video streams The challenges of the task
are: the players are relatively small in the video with chaos
movements; the processing time is efficient to ensure the analyzed
data is reported during the match while the accuracy is required
to be sufficient; the hardware of the system needs to be high
mobility To overwhelm those, we apply Efficient Convolution
Operators (ECO) as a core tracking method to track the targets
on two synchronized laptops, then the data is merged in a
post-processing stage Besides, user interactive functions are
also provided to assist the operators to correct failed tracks
The tracking method is qualitatively evaluated on videos from
professional football matches with two resolution settings The
number of user interactions to correct the failed tracks and the
time processing are chosen as criteria for the evaluation The
results show that ECO tracking outperforms several well-known
tracking methods with less than 1 tracking loss in 2 minutes
on average with processing rate of 12-17 fps In conclusion, the
proposed system is a promising tool for football player tracking
and statistical analysis in practice
Index Terms—ECO tracking, football player tracking system,
user interface functions
I INTRODUCTION
Recently, the demand of use of artificial intelligence (AI)
has become increasingly popular In the field of sport, many
smart systems - based on image and video processing and
machine learning technique - have been designed to assist the
managers in monitoring and analyzing actions and behaviours
of each player as well as providing important statistical
information of the match [1] In a match, information about
each player’s movement plays an important role By using AI
technology, the system provides statistical information such as
traveled distance, range of activity and dynamic level of the
players Also, information obtained from these analyses allows
the coach to have important assessments in the physical and
tactical records of teams and personal development direction
There are several player tracking systems applied in football
sport [2]–[5] During the game, these systems conduct
high-quality video/data collection from fixed cameras around the
football pitches [3], [5] Then, the systems perform detailed
processing and provide analyzed results after a few hours or
a few days Due to not being limited by the time processing
∗ Corresponding author: Manh Ha Luu, halm@vnu.edu.vn
1 FET, VNU University of Engineering and Technology, Hanoi, Vietnam
2 AVITECH, VNU University of Engineering and Technology, Hanoi,
Viet-nam
3 R&D Department, Ecovision, Hanoi, Vietnam
factor, the advantages of these systems are usually with high accuracy, less need of technician involvement However, for
an online purpose such as in TV sport news, the analysts of the match usually need results immediately after each round
to provide the audience the comment In a such case, these systems are not feasible Besides, the systems require multiple fixed cameras on the football pitches leading to a large amount
of data to be processed Consequently, it demands a high cost for cameras and computational systems
In football matches, the players run chaotically on the field with the speed randomly changed, thus the trajectory is highly complicated In addition, the lighting condition may vary considerably which affects the video quality Also, the relative position of the player to the camera changes resulting
in a considerable change in shape with a small size of the the player in the video Moreover, the players usually cross each others thus tracking algorithms may perform incorrectly, resulting in loss of tracking
In this paper, we present a proposed system to handle the problems of tracking the players on the football field The proposed system allows tracking the number of pre-determined players on the field, providing analyzed results in the break time of each round The designed system has several specific properties such as maneuverability, compact and high accuracy The main contributions of this work are:
• We successfully apply ECO tracking method [6] to solve the problem of tracking football players and embed it to the system
• We evaluate the performance of the the tracking method and compare to several other well known tracking meth-ods using a user-scheme interaction counting
The remaining parts of this paper are organized as follows
In the next section, we mention similar prior arts on tracking and identification system for football players Our proposed system is described in details in section III In addition, section
IV presented the experiment and evaluation performed with practical football match to test and verifies the proposed frame-work Finally, section V contains discussions and conclusions
of the main findings achieved
II LITERATURE REVIEW
Tracking multiple football players in the field is an applica-tion of Multiple Objects Tracking (MOT)-a popular problem of object tracking in computer vision Generally, MOT tracking contain multiple single tracking operating at the same time One single tracking accounts for an individual object For this,
Trang 2it is usually solved by using two categories of single tracking
methods: tracking-by-detection and learning-to-track
In the first strategy, objects of a specific category, such that
humans or cars, are detected which acts a main key factor for
the tracking There are a variety of methods to detect targets
such as background subtraction, colour segmentation, and by
applying a trained detector based on visual features such
as HOG features, deep-learning features Then, the detected
objects are linked to form trajectories of the tracking targets
The predictions are associated to the detection and thus an
incorrect detection may lead to incorrect tracking
Several football-player tracking systems such as [5], [7]–
[9] belong to this type In [5], Schilipsing et al presented
a real-time football analysis system based on background
subtraction to detect the players and the Kalman filter They
also used the SVM technique to classify the football players to
connect the target to each trajectory The Kalman filter requires
continuous detection to correct the tracking, however, the
background subtraction method is sensitive to light condition,
which may lead to incorrect players detection Furthermore,
the Kalman filter is mainly suitable for tracking objects with
linear movement which is not the case with chaotic movement
of football players Baysal et al [8] introduced Sentioscope,
a football player tracking system, to track the players in
real-time The system utilizes a particle-filter-based method
which effectively handles occlusion problem and has been
showed outperform several other tracking methods The
track-ing method’s accuracy intensively depends on the resolution of
the particles in the football fields The higher the resolution
is, the better the accuracy is However, the high resolution
of the particles comes with the cost of computation time
which may affect the real-time property when the method
is implemented in a low-cost computational system In [9],
Kim et al described a tracking method for multiple football
players The tracking method uses background subtraction and
edge information to detect the players A multi-scale sampling
strategy with the block matching method is used to find the
best match among the detected players
The other strategy, learning-to-track, has become a trend
recently Several well-known methods such as Multiple
In-stance Learning (MIL) [10], Generic Object Tracking Using
Regression Networks (GOTURN) [11], trackings based on
Correlation Filters [12]–[14], ECO tracking [6] are associated
to this type In general, the learning-to-track methods can
be categorized into online learning and offline learning In
offline mode, the model-based trackers, which are designed to
track a specific class of objects, is performed before the actual
tracking taking place These trackers are trained offline, but
they are limited because they are static and can only track a
specific class of target In addition, in case of tracking multiple
small objects of the same type, such as the players of the same
team, the tracker can not resolve ambiguities, re-assign missed
or occluded of targets Therefore, this strategy may include a
learning from inaccurate information Those errors accumulate
and lead to drift-tracking problem
In on-line learning, the trackers are typically trained entirely
Fig 1 The structure of the proposed system: The upper path is processing blocks for the left camera while the lower path is the processing blocks for the right camera The tracks are merged and analyzed in the laptop 1 before sending to the data center.
online, starting from the first frame of a video, using fore-ground and backfore-ground patches around the targets These patches are then used to train an object-background classifier Consequently, this classifier is used to estimate the new location of the target object in next frame [13], [15]
A survey on football player tracking can be found in [16] It
is reported that “Detection-based trackers gave poor perfor-mance since detection was not reliable" To solve the problem
of false detection in tracking, most recent research [17], [18] proposed to combine extra information from future frames in the video sequence to identify the target However, such a non-causal system is not suitable for online tracking applications Thus, we eliminate detection-based trackers in the application
of football player tracking Using online tracking mode is suit-able for outdoor tracking problems such as varisuit-able of lighting conditions Moreover, because the appearance of players who need to be tracked are typically unknown (e.g different color
of clothes for each game), it’s not feasible for offline training For that reason, we intend to apply ECO tracking method in the proposed football player system
III PROPOSED SYSTEM
A Proposed system in general The main structure of the system is represented in Figure
1 In terms of hardware, the system consists of two parts: the vision part with two cameras (Left view and Right view) and the processing part includes two laptops (see section III-C)
In our framework, we use video streams collected from two independent and fixed (during the march) cameras (Fig 2) The cameras are chosen with a wide-angle to cover a haft
of the playground and with the 2.5K resolution to allows capturing clearly the players at all position in the pitches The core processing of the system is installed on the two laptops Each laptop is responsible for processing video data from one camera Unlike the system [5], we did not merge Left and Right videos to create a big, single and full view of the football field The reason is that the large processing area slows down
Trang 3Fig 2 Mounted IP cameras on a tripod cover the whole field of view of the
football field Both of them are connected to processing laptops via a LAN
network.
the processing speed dramatically Furthermore, it is difficult
to observe the players running in full view on a single laptop
screen Therefore, we decide to handle the videos, process
each video independently before combine the tracking list in
a single laptop The core of the processed part, as shown in
Fig 1, includes several main modules such as Video buffer,
Tracking, Manual ReID, and Merge track-list
The operating principle of the processing system can be
summarized as follows:
• Video buffer: The system is designed to operate online
with as small delay as possible Besides, the system is
semi-automatic, still needs supervisors to correct tracking
errors Thus, instead of using real-time frame-work which
may cause lots of error, or waiting for full video data of
the game which not suitable for the online purpose, we
split and save the video stream into two-minute segments
Processing is continuously carried out on the split videos
In the such way, the analyzed results can be updated in
after completing each 2-minute video segment
• Tracking and Manual ReID: The main purpose of the
system is to track players and provide statistical results,
such as traveled distance a traveled position For this,
we intend to buitl semi-automatic system, using
stop-and-go method ECO tracking method [6] is used as the
core of the tracking system Detail of ECO algorithm
are described in section III-B In case of tracking loss,
manual re-identification (ReID) is performed to correct
the trackloss via an user interface property
• Merge track-list Left-Right: It is emphasized that the
above Tracking and ReID is processed for each half of
the pitch, not on the whole field This method ensures
the technicians to have a large perspective view for each
player; However, this also has limitation when the players
move from half the pitch to the other half To solve this
problem, we perform an combining process of the results
list of Left view and Right view based on the
track-list information including the time, the location of each
player being tracked This process is completed by the Merge Track-list L-R module after each video-segment processing ends
B ECO Tracking Conventionally, ECO tracking algorithm was first intro-duced to track basketball player, which is based on Correlation Filter [6] We adapt the method for the application of football player tracking The main ideas of the tracking schemes are summarized in Figure 3 Typically, correlation operation
is used to distinguish an object and its translations in the video sequence To reduce the complexity of computation, the correlation operation is performed in the frequency domain, thank to the FFT method Furthermore, a cosine window is usually applied for reducing spatial boundary effect of selected regions around the target Furthermore, since the background may cause bad effect on computing correlation between the image of intensity blocks, the discriminative correlation filters are applied aimed to construct a classifier to distinguish the target from its background In ECO tracking method, Minimum Output Sum of Squared Error (MOSSE) filter [12]
is used Subsequently, instead of using directly image intensity, image visual features (such as ColorName feature [13], HOG feature [19]) can be extracted for better describe the image intensity Additionally, based on Continuous Convolutional Operators for Tracking (CCOT) in [14], the target features are learned by using multi-resolution feature maps in a continuous sequence Thus, ECO tracking method is an extension of CCOT Contrary to the CCOT algorithm, ECO tracking does not update the model on every frame but combining features from N frames and the final model is refined using GMM and Conjugate Gradient iterations [20] In summary, ECO tracking has several important properties which more efficient
to football player tracking:
• Learning target features from multiple tracks in the video sequence, so ECO tracking is suitable for deforming and resizing objects due to fast movement
• Applying GMM to regroup similar targets into a few components to decrease the number of learned classifiers and thus reduce the computation cost of the tracking step
• Utilizing factorized convolution operator to minimize the number of feature space
C Hardware configuration The hardware of the system contain the following compo-nents:
• Two Dahua cameras, DH-IPC-HFW4631 model, with H.264 stream encoding, 30/25fps FullHD/2.5K resolu-tion, 2.7 - 13.5mm adjustable lens and shutter speed of 1/10 - 1/100.000s The two cameras are mounted on a tripod with a gap of 1 meter ( see Fig 2)
• Two laptops Dell Precision 7510, Intel Core i7-6820HQ, Ram 16GB, SSD 512GB, 15.6 inch Full HD screen The two laptops connect to the two cameras via a LAN network which enables a connection to the Data center via the Internet
Trang 4Fig 3 General workflow of ECO tracking method: The convolution operation between image and filter is performed by element-wise multiplication (symbols ) in Frequency domain (using Fast Fourier Transform (FFT) and Inverse Fast Fourier Transform (IFFT) The filter’s weights are trained on multiple target features from multi-resolution feature maps in continuous sequence.
IV TRACKING EVALUATIONS ANDRESULTS
A Data
We use ten videos, 4 to 12-minute length for each, from
3 matches collected at Hang Day stadium in V-LEAGUE
tournament 2018-2019 The videos were obtained from two
fixed cameras look at the two sides of the pitch (see Fig 2 )
The original resolution of the videos is in 2.5K with 25 frames
per second Seven of those were resized into FullHD size in
order to verify the dependency of tracker performance on video
resolution The videos were recorded in several conditions
such as in the afternoon and evening; with clear weather and
cloudy weather As in the system pipeline operation, all of the
videos are spitted into several continuous 2-minute videos
B Evaluation
To evaluate the performance of tracking algorithms, we
implemented several different methods and verified them on
the actual video data obtained from the football matches
The tracking algorithms used to compare are Median Flow
[21], KCF [22], MIL [10], Boosting [23] and ECO tracking
[6] In the above algorithms, KCF and ECO tracking are
re-implemented as in the original papers using C++ in Linux
environment The other methods, we use directly source code
in OpenCV library The algorithms are validated on two
different video resolutions, FullHD video and high-resolution
video 2.5K
The purpose of this evaluation section is to check the relevance
of tracking algorithms with conformance to the design of the
system Hence, two criteria are given: the performance of the
tracking algorithm is suitable for the hardware of the system and the efficiency of the algorithm with the conditions of competition on different videos Firstly, we use frame rate
as the evaluation parameter The different tracking methods are tested on the two video resolution and with one to three players to track Besides, the system enables the semi-automaticoperations, but the level of human interaction should
be as little as possible Therefore, we used a semi-automation
as the second evaluation criteria [8] For this, the average
of number of track losses in two-minute video presents the accuracy of tracking methods The less number of track losses are, the better method is In practice, for each tracking loss, the operator has to manual ReID for the track right at the time
it occurs As a result, number of manual ReID interactions can
be count as the number of track losses
For the first evaluation, the results of the frame rate are shown in Table I It can be seen that, when the tracking number increases from one to three players at the same time, the performance of the system with different tracking methods decreases significantly In most cases, Median Flow has the highest frame rate and the second is KCF and then Boosting ECO tracking algorithms have reasonable perfor-mance, varying from 12 fps to 17 fps (a haft of original video frame rate) MIL algorithm has the lowest frame rate Running the system with video at different resolutions also affects system performance Frame rate slightly increases when running Median Flow, KCF, Boosting algorithms In contrast, for higher resolution videos the system processing slow down significantly when running MIL or ECO tracking
Trang 5Fig 4 The interface of analyzed video frame obtain from two cameras at the Left side and the Right side of football field For each team, the number of each player is showed together with the traveled distance and the number of manual ReID times.
TABLE I
A VERAGE OF PROCESSING SPEED (FPS) OF THE TRACKING ALGORITHMS
Tracking Number of tracks (FullHD) Number of tracks (2.5K) Methods 1 player 2 players 3 players 1 player 2 players 3 players
Fig 5 Illustration of ECO tracking, Median Flow, KCF, MIL and Boosting on the same football player at an accelerated movement.
algorithms
For the second evaluation, the averages of number of track
losses in two-minute videos are reported in Fig 6 The result
demonstrates that ECO tracking perform the best, compared
to other tracking methods, with number of track losses are
less than 1 trackloss per two minutes on average In addition,
the number of track losses in video 2.5K are less than those
in video FullHD which can be explained by the fact that
the better resolution provide more detailed features for the
tracking methods
An example of video sequence with all of the tracking
methods on an accelerating player is illustrated in Fig 5 MIL
and Median Flow can not follow the player after 10 frames;
Boosting and KCF start being out of tracking at the frame
number 180 while ECO tracking still fits to the target
V DISCUSSIONS ANDCONCLUSION
We have built a system for football tracking based on criteria
of mobility, accuracy, and online ability The core tracking method, ECO tracker, was quantitatively evaluated and com-pared to other well known tracking methods using video data from several professional football matches The results showed that with the same number of tracking players, ECO tracker perform the best in term of trackloss measurement However,
a drawback of the system is that it can not operate in real-time when the number of processed frames per minutes for three-player tracking, i.e 12 fps, which is smaller than frame rate
of the input video stream This drawback can be improved by dropping-frame technique, but it may increase the inaccuracy
of the tracking Still, we suppose that, once the learning stage for ECO tracking is computed in a compact GPU, the
Trang 6Fig 6 Average number of loss tracking in 2-minutes video segment The
tracking players are randomly initial, from one to three trackers per time, and
we count number of tracklosses during each run on video segments
performance of the system will be improved However, in the
case that all of players on the field need to be tracked at
once, the method seems to be far from practice application due
to the expensive computation Moreover, the system requires
completely manual ReID for the correction, although it was
showed that for three-players tracking, there are less than
one track loss in every 2 minutes on average, which is
inconvenience in the case that there are a few players out
of tracking at the same time Nevertheless, we believe that,
with the current development of deep learning techniques for
person recognition, the automatic player identification will
achieve good result soon Another drawback of the proposed
system is that, two mounted cameras can only provide a
view from a bleacher When occupation occur, there is not
sufficient information to predict the overlapped players Two
cameras placed on the opposite bleacher can compensate for
the drawback, however this may increase the complexity of the
system In conclusion, the proposed system has been tested in
practice and shows a potential to be applied in football data
analysis
ACKNOWLEDGMENT
This work has been supported by VNU University of
Engineering and Technology under project number CN18.14
We would like to thank VTVcab company for supporting us
in data collection and experiment in Hang Day stadium
REFERENCES [1] G Thomas, R Gade, T B Moeslund, P Carr, and A Hilton, “Computer
vision for sports: Current applications and research topics,” Computer
Vision and Image Understanding, vol 159, pp 3–18, 2017.
[2] W.-L Lu, J.-A Ting, K P Murphy, and J J Little, “Identifying players
in broadcast sports videos using conditional random fields,” in CVPR
2011, pp 3249–3256, IEEE, 2011.
[3] C.-W Lu, C.-Y Lin, C.-Y Hsu, M.-F Weng, L.-W Kang, and H.-Y M.
Liao, “Identification and tracking of players in sport videos,” in
Pro-ceedings of the Fifth International Conference on Internet Multimedia
Computing and Service, pp 113–116, ACM, 2013.
[4] A Al-Ali and S Almaadeed, “A review on soccer player tracking techniques based on extracted features,” in 2017 6th International Conference on Information and Communication Technology and Acces-sibility (ICTA), pp 1–6, IEEE, 2017.
[5] M Schlipsing, J Salmen, M Tschentscher, and C Igel, “Adaptive pattern recognition in real-time video-based soccer analysis,” Journal
of Real-Time Image Processing, vol 13, no 2, pp 345–361, 2017 [6] M Danelljan, G Bhat, F Shahbaz Khan, and M Felsberg, “Eco: Efficient convolution operators for tracking,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646, 2017.
[7] W.-L Lu, J.-A Ting, J J Little, and K P Murphy, “Learning to track and identify players from broadcast sports videos,” IEEE transactions
on pattern analysis and machine intelligence, vol 35, no 7, pp 1704–
1716, 2013.
[8] S Baysal and P Duygulu, “Sentioscope: a soccer player tracking system using model field particles,” IEEE Transactions on Circuits and Systems for Video Technology, vol 26, no 7, pp 1350–1362, 2015.
[9] W Kim, S.-W Moon, J Lee, D.-W Nam, and C Jung, “Multiple player tracking in soccer videos: an adaptive multiscale sampling approach,” Multimedia Systems, vol 24, no 6, pp 611–623, 2018.
[10] B Babenko, M.-H Yang, and S Belongie, “Visual tracking with online multiple instance learning,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 983–990, IEEE, 2009.
[11] D Held, S Thrun, and S Savarese, “Learning to track at 100 fps with deep regression networks,” in European Conference on Computer Vision,
pp 749–765, Springer, 2016.
[12] D S Bolme, J R Beveridge, B A Draper, and Y M Lui, “Visual object tracking using adaptive correlation filters,” in 2010 IEEE Com-puter Society Conference on ComCom-puter Vision and Pattern Recognition,
pp 2544–2550, IEEE, 2010.
[13] M Danelljan, F Shahbaz Khan, M Felsberg, and J Van de Weijer,
“Adaptive color attributes for real-time visual tracking,” in Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition,
pp 1090–1097, 2014.
[14] M Danelljan, A Robinson, F S Khan, and M Felsberg, “Beyond correlation filters: Learning continuous convolution operators for visual tracking,” in European Conference on Computer Vision, pp 472–488, Springer, 2016.
[15] S.-H Bae and K.-J Yoon, “Robust online multi-object tracking based
on tracklet confidence and online discriminative appearance learning,”
in Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1218–1225, 2014.
[16] M Manafifard, H Ebadi, and H A Moghaddam, “A survey on player tracking in soccer videos,” Computer Vision and Image Understanding, vol 159, pp 19–46, 2017.
[17] A A Butt and R T Collins, “Multi-target tracking by lagrangian relaxation to min-cost network flow,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1846–
1853, 2013.
[18] L Leal-Taixé, M Fenzi, A Kuznetsova, B Rosenhahn, and S Savarese,
“Learning an image-based motion context for multiple people tracking,”
in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3542–3549, 2014.
[19] N Dalal and B Triggs, “Histograms of oriented gradients for human detection,” 2005.
[20] P Li, D Wang, L Wang, and H Lu, “Deep visual tracking: Review and experimental comparison,” Pattern Recognition, vol 76, pp 323–338, 2018.
[21] Z Kalal, K Mikolajczyk, and J Matas, “Forward-backward error: Automatic detection of tracking failures,” in 2010 20th International Conference on Pattern Recognition, pp 2756–2759, IEEE, 2010 [22] J F Henriques, R Caseiro, P Martins, and J Batista, “High-speed track-ing with kernelized correlation filters,” IEEE transactions on pattern analysis and machine intelligence, vol 37, no 3, pp 583–596, 2014 [23] H Grabner, M Grabner, and H Bischof, “Real-time tracking via on-line boosting.,” in Bmvc, vol 1, p 6, 2006.
... THE TRACKING ALGORITHMSTracking Number of tracks (FullHD) Number of tracks (2.5K) Methods player players players player players players
Fig Illustration... class="text_page_counter">Trang 5
Fig The interface of analyzed video frame obtain from two cameras at the Left side and the Right side of football field For each... ECO tracking, Median Flow, KCF, MIL and Boosting on the same football player at an accelerated movement.
algorithms
For the second evaluation, the averages of number of track