Claves de la IA: Comparativa en funcionalidades - julio del 2025 -

Leyenda de Términos:

🆓 Uso gratuito (≥30 min/día)
Requiere pago
✔️ Open Source (modelo descargable)
🌐 Demo online gratuita
Ctx Ventana de contexto (tokens)
MMLU Comprensión multidisciplinar (0-100%)

🧪 Investigación & Academic Tools

Modelo Principal Benchmarks 🆓 Free Tier 🔓 Open-Source
Perplexity Pro Ctx: 128K · CiteScore: 9.1 · Papers: 200M+ ✔️ (5 búsq./día)
Gemini 2.5 Pro Ctx: 1M · Docs: 10+ formatos · STEM: 92%
Consensus Academic Acc: 89% · Sources: 1B+ ✔️ Total
Elicit AI Auto-Synthesis: 95% · PDF Analysis ✔️ (límite)
⇩ Alternativas OS/Gratis ⇩
MANUS-Research Ctx: 64K · PubMed Acc: 82% · ArXiv: 78% ✔️ (HF) ✔️ (HF)
SciSpace Copilot PDF Q&A · Citation Extraction ✔️ (10 PDF/mes)
Haystack RAG Framework · Doc Search: 88% ✔️ (GitHub) ✔️ (GitHub)

💻 Programación & Code Generation

Modelo Principal Benchmarks 🆓 Free Tier 🔓 Open-Source
GPT-4.5 Turbo HumanEval: 82.3% · Repo Analysis: 91%
DeepSeek-Coder v2 HumanEval: 80.5% · MultiLang: 40+ ✔️ Total ✔️ (HF)
Grok-4 Heavy LiveCodeBench: 79.4% · Debug Acc: 87%
CodeGemma 1.1 Python Focused · MBPP: 84.1% ✔️ (demo)
⇩ Alternativas OS/Gratis ⇩
MANUS-Coder 7B HumanEval: 76% · CodeClarity: 8.2/10 ✔️ (HF) ✔️ (HF)
WizardCoder-34B HumanEval: 79.1% · Python: 92% ✔️ (HF) ✔️ (HF)
CodeLlama 3 70B GSM8K-Python: 81.3% · SWE-bench: 68% ✔️ (HF) ✔️ (HF)

🎨 Generación de Imágenes

Modelo Principal Benchmarks 🆓 Free Tier 🔓 Open-Source
MidJourney v6.5 Estética: 9.3/10 · Gen Speed: 2s
DALL·E 4 Edit Real-Time · BLEU-Image: 41.2 ✔️ (15 gen/día)
Ideogram 2.5 Text Legibility: 95% · Styles: 100+ ✔️ (100 gen/mes)
Playground v3 Realism: 8.9/10 · Free Tier: 20 gen/día ✔️
⇩ Alternativas OS/Gratis ⇩
MANUS-Diffusion FID: 18.3 · CLIP: 32.1 ✔️ (HF) ✔️ (HF)
Stable Diffusion 3.5 Custom LoRAs · Community Models ✔️ (Local) ✔️ (GitHub)
Fooocus 1-Click Install · SDXL Optimized ✔️ Total ✔️ (GitHub)

🎥 Generación de Video

Modelo Principal Benchmarks 🆓 Free Tier 🔓 Open-Source
Runway Gen-3 Alpha 1080p · Motion Smoothness: 9.1/10
Pika Labs 1.2 Beta Abierta · Styles: 50+ ✔️
Google Veo 3 120s Clips · Physics Accuracy: 94%
Kaiber 2.0 Music Sync · 30s gratis/día ✔️
⇩ Alternativas OS/Gratis ⇩
MANUS-Video UCF101: 82% · Kinetics Acc: 76% ✔️ (HF) ✔️ (HF)
Stable Video Apache 2.0 · 576x1024 Resolution ✔️ (HF) ✔️ (HF)
VideoCrafter 2 4K Support · GitHub Active ✔️ Total ✔️ (GitHub)

🧠 Razonamiento Profundo

Modelo Principal Benchmarks 🆓 Free Tier 🔓 Open-Source
Claude 3.5 Opus Ctx: 200K · MMLU: 89.3% · MATH: 84% · AGIEval: 45.1 ✔️ (límite)
Gemini 2.5 Pro Ctx: 1M · MMLU: 88.5% · MATH: 81.2% · ARC: 91.7% ✔️ (Gemini Pro)
Grok-4 Ctx: 128K · MMLU: 79% · MATH: 77.5% · HellaSwag: 89% ✔️ (X Premium)
⇩ Alternativas OS/Gratis ⇩
DeepSeek-R1 Ctx: 128K · MMLU: 83.1% · MATH: 97.3% · GSM8K: 92.5% ✔️ Total ✔️
Qwen 1.5 110B Ctx: 128K · MMLU: 81.5% · MATH: 58.3% · C-Eval: 84% 🌐 (demo) ✔️
Mixtral-8x22B Ctx: 64K · MMLU: 81.4% · ARC: 85.2% · HellaSwag: 87.3% 🌐 ✔️

🤖 Agentes Autónomos

Modelo Principal Benchmarks 🆓 Free Tier 🔓 Open-Source
GPT-4.5 SWE-bench: 82.1% · AgentBench: 8.7 · ToolCall: 96% · Ctx: 128K
Claude 3.5 AgentBench: 8.9 · WebArena: 72.4% · Ctx: 200K · ToolCall: 94% ✔️ (límite)
Gemini 2.5 Pro Ctx: 1M · AgentBench: 9.1 · SWE-bench: 78% · ToolCall: 97%
⇩ Alternativas OS/Gratis ⇩
DeepSeek-R1 AutoGen: 85% · AgentBench: 8.2 · Ctx: 128K · SWE-bench: 75.6% ✔️ Total ✔️
Kimi K2 Ctx: 128K · AgentBench: 7.9 · ToolCall: 86% · WebArena: 68% ✔️ 30 min/día 🌐 (API)
MANUS-Agent ToolCall: 84% · AgentBench: 7.7 · Ctx: 64K · WebArena: 65% ✔️ ✔️

🔗 Fuentes y Referencias (Julio 2025)

🌐 Leaderboards & Benchmarks

📊 Model-Specific Reports

💻 Open-Source Repos

📰 Analyses & Articles