Shivam Garg
Verified Expert in Engineering
Computer Vision Engineer and Developer
Shivam is a senior AI engineer with 4+ years of hands-on experience in deep learning and artificial intelligence. Proficient in various deep learning frameworks such as TensorFlow, PyTorch, and Keras, he excels in generative AI, Stable Diffusion, and large language models (LLMs). Furthermore, Shivam stands out for his extensive expertise in classical computer vision and machine learning.
Portfolio
Experience
Availability
Preferred Environment
Python, PyTorch, TensorFlow, Deep Learning, Generative Artificial Intelligence (GenAI), Stable Diffusion, Computer Vision, Natural Language Processing (NLP), Docker, LangChain, Large Language Models (LLMs), Machine Learning, Data Science, Image Generation, Chatbot, Chatbots, Generative Pre-trained Transformers (GPT), OpenAI GPT-3 API, Generative Pre-trained Transformer 3 (GPT-3), Notion, APIs, Software Architecture, Events, LSTM, BERT, Reinforcement Learning, Falcon, 2D, JavaScript, Text to Speech (TTS)
The most amazing...
...generative AI model I've delivered uses Stable Diffusion and LLMs to animate stories from news articles and helped secure Y Combinator funding.
Work Experience
Senior AI Consultant
Self-employed
- Developed a Stable Diffusion model with ControlNet to convert a sketch into a photorealistic image conditioned with pose inputs. Cross-attention layers were tuned by Lora to optimize the space requirements of the trained model.
- Delivered a generative AI model using Stable Diffusion and LLMs, capable of generating animated stories from news articles, which secured Y Combinator fundraising for the client.
- Developed a unique approach to transform animal images into animated cartoons by training a GAN on unpaired animal images, leveraging StyleGAN architecture, and enhancing the output with CLIP and a feature extractor.
- Built a system to convert 2D images of non-fungible tokens (NFTs) into 3D models using selective 3D inpainting via Stable Diffusion and depth estimation.
- Developed a text-to-art system using techniques such as fine-tuning, autoencoders, and prompt engineering, successfully generating visually appealing art from text descriptions.
- Created a system to detect and classify fake news in India using ML and natural language processing (NLP). Preprocessed text data, employed SetFit and long short-term memory (LSTM) models, and created an ensemble for precise identification.
- Built a tool that searches similar patents on the United States Patent and Trademark Office (USPTO) database using Langchain's OpenAI ada model embeddings and FAISS improved indexing and search of patent embeddings.
- Created an eCommerce product matching system by comparing visual embeddings from the CLIP model with OCR-derived textual embeddings via LLM (ada model), enhancing accuracy and efficiency.
AI Engineer 3
Avatarin Inc
- Created a system to assist human Kanji writing through imitation learning and OpenCV using Kanji videos to generate Kanji images predicting poses for robotic arms.
- Automated health records and invoices for Yale University, leveraging OCR and OpenCV to extract text from diverse health documents and their transition to digital formats.
- Implemented a model that detects suspicious activity at airports using VideoMAE. It prioritized high accuracy, low latency, and efficient deployment on the client's Linux server.
- Shot detection using YOLOv5, OpenCV for object detection, and VideoMAE for shot recognition in TT Games for World Table Tennis Organization.
Senior AI Engineer
AlphaICs
- Implemented a motion transfer system using a first-order model, achieving high-quality motion transfer between faces while preserving the identity and facial expressions of the target face.
- Built a quantization software development kit (SDK) for 4-bit and 8-bit quantization, enabling the efficient implementation and optimization of deep learning models on Edge (CPU-based) hardware, which enhanced performance and capabilities.
- Benchmarked different computer vision and generative models with custom quantization and optimization SDK for IOT and custom Edge devices.
- Worked on brain image segmentation using deep learning, which involves training neural networks to accurately identify and classify structures in brain images linked to Alzheimer's disease. I've used segmentation and computer vision techniques.
- Rolled out a 3D object detection and tracking system for autonomous vehicles using lidar data and the VoxelNet algorithm, enhancing the vehicles' perception and tracking capabilities in a 3D environment.
- Developed an infrared object detection system using the You Only Look Once (YOLO) architecture, achieving high accuracy in detecting objects in infrared images and providing reliable identification and tracking capabilities.
- Created a satellite image segmentation system for detecting agricultural fields using a cascade of U-Net and Mask R-CNN models, improving agricultural analysis and decision-making processes.
Machine Learning Engineer
UnrealAI
- Developed and deployed real-time yoga pose estimation on Android using OpenPifPaf, achieving accurate results for Indian yoga poses. Optimized inference speed and converted the model into TensorFlow Lite format for seamless integration.
- Created a topic modeling model, utilizing LDA and NMF algorithms for latent topic extraction from text corpora, and applied clustering algorithms to group similar topics, providing a better understanding and organization of the text documents.
- Built a computer vision system for accurately detecting items in the kitchen, with high accuracy and low latency. The system was optimized for real-time performance on mobile devices.
- Detected income tax fraud using an ensemble of supervised anomaly detection, unsupervised clusterin, and rule-based backtracking.
Experience
Legal Law Chatbot with RAG, Pinecone Integration, Streamlit UI, and GPT-4
Personalized Art Generation Bot
NFT Image to Immersive 3D
Selective 3D inpainting involves the advanced process of filling in missing or damaged regions in the 2D images, resulting in a complete and visually appealing 3D representation. This technique helps to enhance the overall quality and realism of the generated 3D models.
Depth estimation is another critical component of the system as it enables the determination of the spatial depth information from 2D images. This depth information is essential for creating a sense of depth and perspective in the resulting 3D models.
By leveraging Stable Diffusion, the system ensures a stable and consistent generation process, delivering high-quality and accurate 3D representations of the NFTs from their 2D counterparts. The resulting 3D models can significantly enrich users' viewing and interaction experience in various applications, ranging from virtual galleries to augmented reality environments.
News to Infographics
The process begins with news articles being first summarized using GPT-3.5 Turbo and Davinci, facilitated by LangChain. Subsequently, videos are generated using the fine-tuned Stable Diffusion 2.1 technique, resulting in engaging and dynamic visual representations of the news stories.
Yoga Pose Correction
The trained model was thoughtfully quantized and converted to a TensorFlow Lite format to enhance usability and integration. This conversion facilitated the easy incorporation of the model into Android applications, providing a user-friendly tool for yoga enthusiasts to refine their practice and gain a deeper understanding of different postures.
System and Method for Integer-only Quantization-aware Training for Edge
I developed the pseudo-cross entropy loss function and designed the quantization scheme for integer-only quantization-aware training. Additionally, an SDK was developed that enables the utilization of this system on low-power edge compute devices. The SDK has been successfully used to quantize models on Jetson and the vendors' custom hardware.
Fake News Classification
The project involved preprocessing text data, employing the SetFit model and LSTM, and developing an ensemble of SetFit and LSTM to identify fake news accurately.
Additionally, k-means clustering was used to cluster the type of fake news. The end goal was to create a reliable tool to combat the spread of misinformation. The environment used for this project included Linux, TensorFlow, k-means clustering, scikit-learn, Python, and SetFit.
Text-to-video Generation for Mathematical Equations
Skills
Languages
Python, C++, Falcon, JavaScript, Bash Script
Frameworks
Flask, LlamaIndex, Django, Streamlit
Libraries/APIs
PyTorch, TensorFlow, Scikit-learn, SpaCy, OpenCV, Pandas, LSTM, Google Speech-to-Text API, Keras, Fast.ai
Tools
You Only Look Once (YOLO), Git, Notion, Haystack, Azure Machine Learning, Whisper, Amazon SageMaker, Google Bard
Paradigms
Data Science, ETL, Azure DevOps, Continuous Development (CD), Continuous Integration (CI), Search Engine Optimization (SEO)
Platforms
Docker, AWS IoT, Google Cloud Platform (GCP), AWS Lambda, Amazon EC2, iOS, Linux, Amazon Web Services (AWS), Azure
Storage
MySQL, MongoDB, Databases
Other
Deep Learning, Generative Artificial Intelligence (GenAI), Stable Diffusion, Computer Vision, Natural Language Processing (NLP), Quantization, Models, TensorFlow Light, Machine Learning, LangChain, Statistics, Depth Estimation, Time Series, Hugging Face, Detectron, Generative Pre-trained Transformers (GPT), GPT, Large Language Models (LLMs), Artificial Intelligence (AI), OCR, Convolutional Neural Networks, Image Processing, ChatGPT, OpenAI GPT-4 API, OpenAI GPT-3 API, Text to Image, Diffusion Models, NLU, Deep Neural Networks, Language Models, Statistical Analysis, Data Analysis, Image Analysis, Image Generation, Chatbot, Chatbots, Generative Pre-trained Transformer 3 (GPT-3), Llama 2, Text Analytics, Model Development, Video & Audio Processing, OpenAI, HubSpot, APIs, HubSpot CRM, Retrieval Augmented Generation (RAG), Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, Events, BERT, Reinforcement Learning, PEFT, 2D, Speech to Text, Point Clouds, Point Cloud Data, Text to Speech (TTS), NVIDIA TensorRT, FastAPI, Pose Estimation, 3D Reconstruction, DreamBooth, LoRa, Generative Adversarial Networks (GANs), K-means Clustering, Edge AI, Quantisation, Open Neural Network Exchange (ONNX), Prunning, Benchmarking, Object Detection, Machine Learning Operations (MLOps), Product Matching, Prompt Engineering, ControlNet, Gradio, Civitai, Videos
Education
Bachelor of Technology Degree in Computer Science
University School of Information, Communication and Technology - Dwarka, Delhi, India
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring