A lot of things can be done by detecting an object in real time or in other words, can be said as video analysis like identifying a terrorist among a group of people to detecting the number of vehicles in the traffic. The real-time object detection is quite fascinating as the user is able to see the result at the same time.

For real-time object detection, the video is processed and the object detection is done through the various processes. In the above example, the video is captured through the webcam which is then the video is processed through frame by frame and image preprocessing operations like converting the image to grayscale or image thresholding or blurring on each image is performed. Then the frame is passed on to the Machine LearningModel (For this case the ML model is used for object detection) and the model identifies the object and gives a predicted result as an output frame which is replaced by the actual frame.


A typical CNN network gradually shrinks the feature map size and increase the depth as it goes to the deeper layers. The deep layers cover larger receptive fields and construct more abstract representation, while the shallow layers cover smaller receptive fields. By utilising this information, we can use shallow layers to predict small objects and deeper layers to predict big objects, as small objects don’t need bigger receptive fields and bigger receptive fields can be confusing for small objects.

Key properties:

  • Different bounding box prediction is achieved for each of the layers and hence final prediction is union of all the predictions.
  • It has the ability to detect small objects as well because we make independent object detection from multiple feature maps

In traditional machine learning, we train a model A with data and labels for a particular task or domain A. On another occasion, for a task or domain B we again require labelled data to train model B. For example, a model trained on images of cats and dogs will fail to detect pedestrians.

Transfer Learning allows us to store and use the knowledge gained in one task or domain A in some other related task or domain B. We achieve it by storing the weights of model A and initializing model B with the stored weights before training. The major advantage is that transfer learning allows us to train a deep learning model with relatively fewer data.