Computer Vision Internship with Kami Vision

After graduating from NTU in May of 2022, I was more than excited for my year-long break before proceeding for my graduate studies. I had planned to relax and do some light work for the first 5-6 months and then try for an internship in the later half for a few months to gain some meaningful experience that would help me later on during job applications.

In November 2022, I started applying for internships and got a few invites for coding tests and interviews. Eventually, I selected Kami Vision for my internship as it is a startup specialising in Computer Vision solutions, making it perfectly aligned with my interests and career goals.

I started work in December 2022 and continued until April 2023, after a two month extension after the original 3 month internship. During the course of my 5 months at Kami, I worked on various different projects, reflecting the swift nature of working at a startup.

The first thing I worked on was creating a fine-tuning pipeline for the YOLOv7 pose model for human action recognition. I had never worked on YOLOv7 before as it was the SOTA model at the time so I needed a while to familiarise myself with the code repository and training procedure. Once done, I created a small custom dataset to test the code on and successfully developed a simple fine-tuning pipeline to custom train the model.

The next task assigned to me was to explore methods to generate IR lighting images from RGB images. This was a data augmentation approach the team was considering to improve their current model evaluation results and was a high-priority task. I started by exploring some existing open-source approaches and evaluating them based on quality of the generated image and time taken to perform the conversion. Finally, I managed to find an approach which could convert an image of one type into another, also referred to as image-to-image mapping, in a reasonably fast time. However, the provided code could only work with one image of each type so I modified the code to work with large datasets and presented it to the team. This approach was then used to augment many datasets during the course of my internship, and is the method of choice to generate IR lighting images.

Next, I worked on developing a custom object tracking algorithm for one of the solutions offered by the company. I was tasked to add object tracking to prevent false alert rate from the application. After exploring a few alternatives, I developed the algorithm using an open source object tracking library called motpy which provides single object tracking. I built a multi-object tracker on top of that to create the algorithm which was then tested and deployed. I also worked with another member of the team to adapt this algorithm for another project.

Finally, I spent the last few months of the internship working on a nascent project idea presented by the team lead. It was to use a transformer-based model to perform human action recognition. The idea was to use a sequence of images pass them to the transformer and output an action label. Since I had never worked with transformers before, I had to start from scratch by reading and learning about them before starting any coding work. I spent a few days learning about transformers from various articles, research papers and YouTube videos. Then, I came across an adaptation of transformers for Computer Vision applications known as Vision Transformers. On studying more about them, I learned that they are used for image classification tasks which I thought I can extend further to classifying sequences of images. After a few failed attempts, I decided to table this approach and try a different approach.

My supervisor then suggest I try to predict simple sine waves instead of directly trying to work with images. I took his advice and spent a few days working on developing a network that can predict sine waves. The architecture was looking good but I needed to perform a classification task, not predict the next items in the sequence. Then, I came across an approach which used transformers to classify sequences of numbers. This was a breakthrough in that images are also sequences of numbers which I needed to classify. I was able to incorporate their approach into mine by writing my own custom dataloader script, training and inference scripts in addition to the model architecture in PyTorch. I also created my own dataset using the open source AVA dataset.

Furthermore, this internship experience also allowed me to connect with an incredible and friendly team which was extremely welcoming towards me and the other student interning with me. I spent a lot of time talking with and learning from everyone in the team, each of whom had their own backgrounds, thoughts and experiences to learn from. In addition, the daily team update meetings provided a platform for discussion of ongoing projects and concerns of each member of the team, including interns. Each member of the team would present their current progress and report on any results or roadblocks which would be discussed at length.

A fantastic 5 months with the Kami India AI team, I gained a lot of experience with PyTorch and writing custom models and algorithms, as well as communication and presentation skills.

Keywords
  • Computer Vision
  • Internship
  • Python
  • PyTorch
  • OpenCV
  • Video Processing
More Posts
Courses taken in the same semester or experiences with related concepts