Hire the Best CUDA Developers
Delhi, India
I am an AI/ML engineer with 3+ years building production machine-learning systems, backed by a research track record: two accepted peer-reviewed papers, a two-time Google Summer of Code contributor, and published GPU work. What I can build for you: • LLM & vision-language fine-tuning — QLoRA/PEFT, dataset creation, evaluation, deployment-ready inference. • Self-hosted & open-source LLM deployment — running models on your own GPU servers for privacy, control, and lower cost. • RAG systems & AI agents — retrieval grounded in your documents, data, and tools, with evaluation and monitoring so it stays reliable in production, not just in a demo. • GPU & scientific computing — CUDA/CuPy/JAX acceleration, profiling, and pipelines that process terabytes of data • Full-stack delivery — FastAPI backends, React/Electron front-ends, Docker/Kubernetes, so the model ships as a real product your team can run How I work: define the use case, build the smallest reliable version first, test the edge cases, document everything, and hand you code your team can maintain after the contract ends. Core stack: Python, PyTorch, Hugging Face, CUDA/CuPy, FastAPI, React, Docker, Kubernetes. I am Top Rated with 100% Job Success and I reply within a few hours. Tell me what you're trying to build and I will build it for you.
- CUDA
- Web Development
- Machine Learning
- Deep Learning
- AI Agent Development
- Python
- PyTorch
- Computer Vision
- Data Science
- FastAPI
- React
- Docker
- Natural Language Processing
- Data Visualization
- API Development
- Kubernetes
- TensorFlow
- Generative AI
- Astrophysics
- Physics
Incheon, South Korea
I design, optimize, and integrate real-time object detection and tracking pipelines for NVIDIA Jetson, RK3588, and cloud environments — with measurable gains in FPS, latency, and deployment stability. From YOLO training to TensorRT optimization and production integration, I build Computer Vision systems that work reliably on real hardware. If you need more than just a trained model — if you need a working AI system integrated into hardware or software — I deliver complete, production-ready solutions. 🎯 WHAT I DELIVER • End-to-End Deep Learning Pipelines Model architecture → dataset optimization → training → evaluation → deployment • Real-Time Object Detection & Multi-Object Tracking Optimized YOLO pipelines with stable tracking (DeepSORT / ByteTrack / BOT-SORT) • ⚡ Edge AI Acceleration & Performance Optimization TensorRT conversion, CUDA acceleration, latency reduction, memory tuning, FPS improvements • 🔗 AI Integration into Production Systems Jetson deployment, inference APIs, embedded integration, debugging, monitoring & system optimization 🏆 PROVEN EXPERIENCE • 5★ Upwork reviews for NVIDIA Jetson Nano, AGX & Orin deployments • Deployed Frigate and real-time CV pipelines on Jetson Orin NX • Optimized inference performance using TensorRT for improved real-time execution • Installed and configured LLM environments with secure key management & usage tracking • Research & production AI deployment experience at HBrain 🛠 TECH STACK Deep Learning: PyTorch, CNN architectures, YOLO variants Computer Vision: OpenCV, real-time video processing, detection & tracking Optimization: TensorRT, CUDA Systems: Linux, Embedded Systems, NVIDIA Jetson Languages: Python, C++ If you share your hardware target, dataset sample, or performance goal, I can propose a clear technical architecture and realistic delivery plan.
- Computer Vision
- Artificial Intelligence
- Edge AI
- Object Detection & Tracking
- NVIDIA Jetson
- YOLO
- Deep Learning
- Image Segmentation
- Anomaly Detection
- Python
- AI Model Integration
Al Mansurah, Egypt
👋 Hi, I’m Mohamed — a Top Rated🏆 Senior Software Engineer. 🚀 I help companies design, build, and deliver high-performance software systems — from computer vision and AI pipelines to real-time desktop applications and embedded platforms. 🧩 What I deliver: ✅ Industrial Computer Vision Systems 🔹Integration with industrial cameras (Basler, Luxonis, FLIR / Point Grey) 🔹High-accuracy inspection, measurement, and defect detection 🔹Optimized OpenCV pipelines in C++ and Python 🔹Hybrid pipelines combining deep learning with classical computer vision for robustness ✅ Deep Learning (ResNet, U-Net, YOLO, MediaPipe, ...) 🔹Custom dataset creation, labeling, and curation 🔹Inference optimization for CPU, GPU, and edge devices 🔹YOLO for real-time object detection, segmentation, OBB, and pose estimation 🔹YOLO fine-tuning and transfer learning for domain-specific data 🔹MediaPipe for pose estimation, hand tracking, face landmarks, and motion analysis ✅ Real-Time & Performance-Critical Software 🔹Low-latency C++ systems optimized for speed and memory 🔹Multithreading, SIMD, profiling, and algorithmic optimization 🔹GPU acceleration (CUDA) when justified 🔹Designed for long-running, production-grade operation under real-world constraints ✅ Embedded & Edge AI 🔹Vision and AI deployment on Raspberry Pi 🔹Sensor integration, control logic, and hardware-in-the-loop testing with ESP32, Arduino ✅ Full Project Ownership 🔹System architecture and technical leadership 🔹Desktop GUIs: Qt, MFC, OpenGL 🔹Clean handover, documentation, and maintainable codebases 👉 Let's discuss how I can bring your AI, computer vision, or embedded system project to life.
- C++
- Python
- Qt Framework
- Computer Vision
- OpenCV
- Deep Learning
- PyTorch
- Machine Learning
- QML
- YOLO
- MATLAB
- OpenGL
- Microsoft Foundation Class Library
- Raspberry Pi
- ESP32
- Robotics
Gandhinagar, India
I help startups and AI companies build high-performance AI systems, GPU-accelerated applications, and scalable AI infrastructure using CUDA, Rust, and Python. My expertise focuses on performance engineering, AI infrastructure, parallel computing, and low-latency systems designed for production-scale workloads. I work on optimizing compute-heavy applications, improving inference performance, and building reliable backend systems for modern AI products. I specialize in: • CUDA & GPU Optimization • AI Inference Optimization • Rust-based High Performance Systems • AI Infrastructure Engineering • Parallel Computing • Low-Latency Backend Systems • Python-based AI & Automation Systems • LLM Infrastructure • RAG Architecture & AI Search • Scalable API & System Architecture I can help with: ✅ GPU acceleration and CUDA optimization ✅ AI inference performance tuning ✅ Rust backend systems for high-performance workloads ✅ AI infrastructure and deployment pipelines ✅ Memory and compute optimization ✅ Parallel processing systems ✅ AI search and RAG pipelines ✅ Python automation and backend development ✅ Scalable distributed systems ✅ Production-ready AI architecture Tech Stack & Tools: CUDA, Rust, Python, PyTorch, TensorRT, ONNX, Docker, FastAPI, PostgreSQL, Vector Databases, REST APIs, WebSockets, AWS, Azure, and cloud-native architectures. I focus on building systems that are: * performant * scalable * efficient * reliable * production-ready If you are building GPU-intensive applications, AI infrastructure, inference systems, or performance-critical software, I’d be happy to discuss your project. Send me a message, and let’s discuss how we can make it happen!
- CUDA
- Generative AI
- AI Agent Development
- Chatbot Development
- GPU
- Python
- MERN Stack
- Mobile App Development
- Web Application Development
- AI Chatbot
- Vector Database
- DevOps
- AWS Development
- LLM Prompt Engineering
Alboraya, Spain
🏆Top Rated for more than 10 years on Upwork | 🏆 70+ projects | 🏆 100% Job Success Score | 🏆 16 years experience | 🏆Working with 3x enterprise companies and so many start-ups/individual clients Production-grade computer vision systems for real-world environments — edge deployment, system integration, and measurable business outcomes. I graduated as an Electrical Engineer M.Sc. (Embedded Software Developer faculty). I am an expert in AI/Machine learning (object detection) and IoT/Edge Computing (Raspberry Pi and Jetson Orin NX/Nano, Deepstream 6.x, 7.x, Triton/TAO toolkit). I prefer long-term projects, but you can hire me for shorter ones also as a consultant. Strength: - Computer Vision/Machine learning Project architect/design - technical stack, cost of long-term usage, refine user requirement etc. - Python Machine learning projects (Tensorflow, Keras, PyTorch) - especially with computer vision, object detection (YOLO), image segmentation and pose tracking (OpenPose), shape recognition (dlib) and action recognition (X3D, mmaction etc.) - Develop/maintain/upgrade your existing Deepstream/Triton based application - Embedded image processing solutions (Edge computing): computer vision on Jetson Orin (NX/Nano) with Deepstream7.0 (earlier version too 6.x/7.x, Triton, TAO toolkit) or on Raspberry Pi, using Intel Movidius Neural Compute Stick (NCS) + OpenVINO. Train a model for specific purpose, provident dataset collection/preparation/preprocessing task management, I specialized for object detection (up to YOLOv12) - AWS, GCP (Google Cloud Platform) VM inference and training setup and execution/deployment - NVIDIA GPU experience: Tesla T4, L4, A100, H100 GPU for training and inference/server production. - Home automation/Smart home - lighting, building engineering, heating/cooling, Home assistant: integrate Amazon Alexa, Phillips Hue, Sonos, Apple TV, Google Nest, custom solutions - Raspberry Pi (Linux - Raspbian) - programming and teaching in Python (RS232, RS485, TCP communication), Wi-Fi settings, Home assistant Services: - Short (30-60 minutes) consultation - AI Architect tasks: responsible for the architecture of AI solutions, which includes planning, implementing, and managing AI technologies within an organization or project - I can speak English and Hungarian (magyar) and a little bit Spanish (castellano, espanol) AI Computer Vision | MLOPS | Edge Computing | AI Automation | AI strategy | AI consultant
- AI Consulting
- AI Development
- Rapid Prototyping
- AI Model Integration
- Computer Vision
- NVIDIA Jetson
- NVIDIA Triton
- AI Model Development
- Machine Learning
- Raspberry Pi
- Python
- OpenCV
- Automation
- ML Automation
- Prototype
Warsaw, Poland
Hello! My name is Aleksandr Kalinin. I have a double major in Math and CS. I have been programming professionally since 2006. Skills: C++ 11/14/17/20/23, STL, SIMD (SSE, AVX, AVX512), CMake, vcpkg, Conan, Boost, perf, VTune. Python, Data Science, NumPy, Pandas, Matplotlib. Rust, tokio. Algorithms, Data Structures, Computational Geometry, Optimization, Linear Algebra, Numerical Methods. Concurrent Programming, Multithreading, Lock-free Programming. CUDA, HIP, OpenCL, Compute Shaders, Halide, TBB, OneAPI. Graphics Programming, Vulkan, Metal, DirectX 12, OpenGL, HLSL, GLSL, MSL, Slang, Real-time Rendering, Custom Engine Development. Unreal Engine 4/5, Godot, Cocos2dx Machine Learning, Deep Learning, PyTorch, TensorFlow, Scikit-Learn, ONNX, TensorRT. Computer Vision, Image Processing, OpenCV, Roboflow, Triton, YOLO, YOLOv11, DETR, RT-DETR, Faster R-CNN, R-CNN, U-Net, DeepLabV3+, SAM, SAM2, SegFormer, ResNet, ConvNeXt, ViT, Swin. Optimization / Solvers: OR-Tools (CP-SAT), Z3, MiniSAT, SCIP, HiGHS, IPOPT, MiniZinc. Point Cloud Processing, Collision Detection, Physics Simulation. Maya API, Houdini API (HDK, VEX, HScript), Blender Python API, Pipeline Tools Development. Software Porting, Game Development. Qt, QML, PySide6, WxWidgets, VTK. iOS Development, Swift, Objective-C, UIKit, AVFoundation, Metal. Android Development, NDK. FFmpeg WebRTC, RTMP, RTSP, SRT, HLS, NDI, MPEG-DASH, CMAF, WebTransport, QUIC, UDP/RTP/RTCP, TCP, WebSockets, HTTP Live Streaming pipelines, Video Processing, Real-time Streaming Systems. JavaScript, Node.js, Three.js, Pixi.js. Docker, Kubernetes, AWS, EKS, CI/CD, n8n. PostgreSQL, Supabase. Windows, Mac, Linux. AI, Claude Code.
- OpenCV
- C++
- Python
- Machine Learning
- Mathematica
- Mathematics
- MATLAB
- Image Processing
- DirectX
- OpenGL
- Microsoft Windows
- Geometry
- 3D Graphics Framework
- CMake
- Computer Graphics
How it works
Post a job for free Post a job
Tell us what you need. Create your own job post or generate one with AI then filter talent matches.
Hire top talent fast
Consult, interview, and hire quickly, so you can meet the freelancers you're excited about.
Collaborate easily
Use Upwork to chat or video call, share files, and track project progress right from the app.
Payment simplified
Manage payments in one place with flexible billing options. Only pay for approved work, hourly or by milestone.
Don't just take our word for it
“Upwork provides an umbrella-level of security. I can see a talent’s work history and ratings. I can hold payments in escrow. I can communicate through Upwork Messages instead of working through my email address.”
Kim Darling
Emerald Tiger
“Upwork is the best platform to hire skilled professionals when we're not looking for a full-time employee. All the companies in our portfolio use Upwork to find talent across a wide range of fields.”
David Merry
Kinetic Investments
“Our very specific requirements can be a challenge—With Upwork, we’re able to access a bigger community to ensure the success of our projects.”
Katja Krohn
Summa Linguae
At A Glance: CUDA
Graphics processing units have been utilized by the top programmers and designers for creating detailed applications that improve user experience. For businesses, GPU assists in a range of needs, such as market analysis, data processing, ad creation and placement, and much more. Harnessing this complex power is possible with CUDA — which stands for Compute Unified Device Architecture — because it allows experts of every level to interact with and utilize the many advantages of GPUs with less effort and confusion. Maybe you don’t have the time to spare from your many duties, or perhaps you are not fluent in the necessary skills required to use CUDA. There’s no need to miss out on its advantages; the CUDA specialists on Upwork are here to ensure your business thrives, offering additional benefits with their competitive rates, diverse skill sets, and flexible hours.
The freelancers on Upwork boast of many years’ experience in a diverse range of jobs, which has allowed them to gain knowledge and become proficient in a wide range of specialties and talents. This enables them to apply their exclusive expertise to any project and provide unique insights and suggestions, as well as solutions to bugs and technical issues. These experts also have great familiarity in the online workplace, which allows them to collaborate remotely with other teams or work independently with little supervision. With thousands of experts to choose from on Upwork, you’re sure to find a professional who boasts of the unique experience, education level, and work ethic you need for your project.
Find more freelancers
Similar CUDA Developer Skills
- MATLAB Developers
- OpenCL Developers
- MATLAB Experts
- Julia Developers
- Numpy Professionals
- ChatGPT Developers
- CUDA Consultants
- fastText Specialists
- QML Developers
- AI Developers
- Deep Learning Experts
- Reinforcement Learning Specialists
- Bayesian Statistics Developers
- Machine Learning Engineers
- TensorFlow Specialists
- Unsupervised Learning Specialists
Top Countries for CUDA Developers
- CUDA Developers in Egypt
- CUDA Developers in India
- CUDA Developers in Pakistan
- CUDA Developers in Bangladesh
- OpenCV Developers in India
- OpenCV Developers in Pakistan
- Deep Learning Experts in Ukraine
- Deep Learning Experts in South Korea
- Deep Learning Experts in Algeria
- Deep Learning Experts in Israel
- Deep Learning Experts in Armenia
- Deep Learning Experts in Georgia
- Deep Learning Experts in Germany
- Deep Learning Experts in France
- Deep Learning Experts in Ethiopia
- Deep Learning Experts in Egypt