Categories: Computer Vision

How to run YOLO on a CCTV live feed

In this blog we explore how to run a very popular computer vision algorithm YOLO on a CCTV live feed. YOLO (You Only Look Once) is a very popular object detection, remarkably fast and efficient. There is a lot of documentation on running YOLO on video from files, USB or raspberry pi cameras. This series of blogs, describes in details how to setup a generic CCTV camera and run YOLO object detection on the live feed. A lot of the code used here came from In case you are interested in finding more about YOLO, I have listed out a few articles for your perusal at the end of this blog.

Setup a CCTV with RTSP

An earlier blog lists out in details methods to setup a generic CCTV camera with a live RTSP feed. Note the RTSP url, as we will need it in the later stages.

Install Python and OPENCV

We will use python 3.6 and openCV 4 in this walkthrough. Ensure you have a computer that has both and the appropriate versions. In case you have never installed OPENCV, please refer to this guide. It documents installation of OPENCV on several different operating systems

Install virtualenv for managing python libraries

I strongly recommend you to use virtualenv to manage your python development workflows, especially if you work on multiple python projects simultaneously. For more details on this package refer to documentation.

pip3 install virtualenvwrapper
mkvirtualenv env1

Install necessary python libraries with pip

We will need the following libraries to run YOLO on a live CCTV feed. The required libraries can be installed using the command below.

pip3 install numpy imutils time cv2 os 

Download the YOLOv3 weights and config files

The weights, config and names files to run Yolo v3 can be downloaded from the Darknet website. Make a directory called yolo-coco and keep the files there.

Python code

Open a file called and copy the following code there. Replace the string <RTSP_URL> with the RTSP url for your camera.

# import the necessary packages
import numpy as np
import argparse
import imutils
import time
import cv2
import os
from import FPS
from import VideoStream

# load the COCO class labels our YOLO model was trained on
labelsPath = os.path.sep.join([YOLO_PATH, "coco.names"])
LABELS = open(labelsPath).read().strip().split("\n")

# initialize a list of colors to represent each possible class label
COLORS = np.random.randint(0, 255, size=(len(LABELS), 3),

# derive the paths to the YOLO weights and model configuration
weightsPath = os.path.sep.join([YOLO_PATH, "yolov3.weights"])
configPath = os.path.sep.join([YOLO_PATH, "yolov3.cfg"])

# load our YOLO object detector trained on COCO dataset (80 classes)
# and determine only the *output* layer names that we need from YOLO
print("[INFO] loading YOLO from disk...")
net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# initialize the video stream, pointer to output video file, and
# frame dimensions
vs = cv2.VideoCapture(RTSP_URL)
fps = FPS().start()
writer = None
(W, H) = (None, None)


# loop over frames from the video file stream
while True:
 # read the next frame from the file
 (grabbed, frame) =

 # if the frame was not grabbed, then we have reached the end
 # of the stream
 if not grabbed:
 # if the frame dimensions are empty, grab them
 if W is None or H is None:
  (H, W) = frame.shape[:2]

 # construct a blob from the input frame and then perform a forward
 # pass of the YOLO object detector, giving us our bounding boxes
 # and associated probabilities
 blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416),
  swapRB=True, crop=False)
 start = time.time()
 layerOutputs = net.forward(ln)
 end = time.time()

 # initialize our lists of detected bounding boxes, confidences,
 # and class IDs, respectively
 boxes = []
 confidences = []
 classIDs = []

 # loop over each of the layer outputs
 for output in layerOutputs:
  # loop over each of the detections
  for detection in output:
   # extract the class ID and confidence (i.e., probability)
   # of the current object detection
   scores = detection[5:]
   classID = np.argmax(scores)
   confidence = scores[classID]

   # filter out weak predictions by ensuring the detected
   # probability is greater than the minimum probability
   if confidence > CONFIDENCE:
    # scale the bounding box coordinates back relative to
    # the size of the image, keeping in mind that YOLO
    # actually returns the center (x, y)-coordinates of
    # the bounding box followed by the boxes' width and
    # height
    box = detection[0:4] * np.array([W, H, W, H])
    (centerX, centerY, width, height) = box.astype("int")

    # use the center (x, y)-coordinates to derive the top
    # and and left corner of the bounding box
    x = int(centerX - (width / 2))
    y = int(centerY - (height / 2))

    # update our list of bounding box coordinates,
    # confidences, and class IDs
    boxes.append([x, y, int(width), int(height)])

 # apply non-maxima suppression to suppress weak, overlapping
 # bounding boxes
 idxs = cv2.dnn.NMSBoxes(boxes, confidences, CONFIDENCE,

 # ensure at least one detection exists
 if len(idxs) > 0:
  # loop over the indexes we are keeping
  for i in idxs.flatten():
   # extract the bounding box coordinates
   (x, y) = (boxes[i][0], boxes[i][1])
   (w, h) = (boxes[i][2], boxes[i][3])

   # draw a bounding box rectangle and label on the frame
   color = [int(c) for c in COLORS[classIDs[i]]]
   cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
   text = "{}: {:.4f}".format(LABELS[classIDs[i]],
   cv2.putText(frame, text, (x, y - 5),
    cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

 # check if the video writer is None
 if writer is None:
  # initialize our video writer
  fourcc = cv2.VideoWriter_fourcc(*"MJPG")
  writer = cv2.VideoWriter(OUTPUT_FILE, fourcc, 30,
   (frame.shape[1], frame.shape[0]), True)

  # some information on processing on the first frame
  if total > 0:
   elap = (end - start)
   print("[INFO] single frame took {:.4f} seconds".format(elap))
   print("[INFO] estimated total time to finish: {:.4f}".format(
    elap * total))

 # write the output frame to disk
 # show the output frame
 cv2.imshow("Frame", cv2.resize(frame, (800, 600)))
 key = cv2.waitKey(1) & 0xFF
 #print ("key", key)
 # if the `q` key was pressed, break from the loop
 if key == ord("q"):

 # update the FPS counter

# stop the timer and display FPS information

print("[INFO] elasped time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))

# do a bit of cleanup
# release the file pointers
print("[INFO] cleaning up...")

The program is now ready to run. The live feed from the camera is fed via RTSP. Each frame is run through the YOLO object detector and identified items are highlighted as can be seen below. The program can be stopped by pressing the key ‘q’ at any time.

Final Notes

I ran this program on my non-GPU MacAir laptop, with an FPS of 1. Using a GPU or an accelerator the FPS can be increased significantly to achieve a real time full FPS object detection. Alternatively, you can choose run every 10th or 20th frame in case you don’t have a GPU acceleration.


  1. Darknet
  2. Yolo object detection with OPENCV
Praveen Pavithran

Published by
Praveen Pavithran

Recent Posts

How to setup FTP on AWS Ubuntu

In this blog we will show how to setup an FTP on AWS machine. They…

4 months ago

How to setup CI/CD for React using Jenkins and Docker on AWS S3

Introduction : Continuous Integration (CI) is a development practice that requires developers to integrate code…

5 months ago

How to setup a CCTV camera with JioFi

In this blog we explain how to enable live viewing for CCTV camera with a…

5 months ago

How to setup CCTV Camera with XMeye tool set

In this blog we will setup a Generic CCTV camera supported by XMeye. I will…

6 months ago

AI with a Generic CCTV Camera

CCTV cameras are ubiquitous. You can find one everywhere. On the roads, at traffic junctions,…

6 months ago

What is Edge AI and how is it done?

As explained in earlier blog edge AI is running your AI inference as close to…

1 year ago