Sonnet 5, Copilot browser tools, and Nano Banana 2 Lite — AI Digest for July 1, 2026

Agentic coding got the clearest practical upgrades today: Anthropic pushed a cheaper Sonnet-class model into general availability, while GitHub moved browser-driving tools for Copilot in VS Code out of preview. The quieter signal is cost control. Google is pushing fast image generation down to low-cent pricing, GitHub is routing CLI work to cheaper models when it can, and NVIDIA published a GPU query-engine reference design for teams trying to make data systems less CPU-bound.

The short version

Update	What changed	Builder take
Claude Sonnet 5	Anthropic says Sonnet 5 is available across Claude plans, Claude Code, and the Claude Platform, with API model name `claude-sonnet-5` and introductory pricing of $2 per million input tokens and $10 per million output tokens through August 31, 2026. 1	Worth testing for agentic coding if Opus-class models are too expensive, but treat Anthropic's benchmark claims as vendor-reported until you run your own tasks.
GitHub Copilot in VS Code and CLI	Browser tools for Copilot in VS Code are now generally available, and Copilot CLI can use auto model selection based on task type, model health, and token efficiency. 2 3	The agent loop is moving closer to real web-app testing. Admins should check site allow/deny policies before turning this loose in sensitive environments.
Google Nano Banana 2 Lite and Gemini Omni Flash	Google released Nano Banana 2 Lite for image generation and Gemini Omni Flash for video generation/editing in AI Studio, the Gemini API, and enterprise tooling. Google quotes 4-second text-to-image latency for Nano Banana 2 Lite and $0.10 per second of video output for Gemini Omni Flash. 4	Useful if you build creative or commerce workflows where iteration speed matters more than maximum image quality. Keep provenance and watermarking in the product design.
OpenAI GeneBench-Pro	OpenAI introduced a 129-question computational biology benchmark for agentic scientific analysis, with 10 public representative questions on Hugging Face. 5	The useful part is not the headline score. It is the benchmark design: messy datasets, deterministic grading, and explicit tests of judgment under uncertainty.
NVIDIA GQE	NVIDIA published GQE, an open-source GPU Query Engine reference architecture built on cuDF, nvCOMP, CCCL, and nvSHMEM. NVIDIA reports a 7.5x aggregate speedup over a DuckDB CPU baseline on its TPC-H SF1000 experiment, while noting the results are not official TPC-H-compliant numbers. 6	Relevant for database and analytics teams asking where GPUs help after the model layer: compression, transfer scheduling, and pruning may matter as much as kernels.
GitHub license compliance	GitHub detailed a new License Compliance feature for GitHub Advanced Security customers that checks new dependencies against license policy in pull requests. 7	Less flashy than model launches, but directly useful for teams scaling open-source dependency intake. Start in evaluate mode before blocking merges.

Anthropic makes Sonnet-class agents cheaper to try

Claude Sonnet 5 is Anthropic's new mid-tier workhorse for coding, tool use, and agentic tasks. Anthropic says it is now the default model for Free and Pro plans, available to Max, Team, and Enterprise users, and available in Claude Code and the Claude Platform. The API model name is claude-sonnet-5. 1

The pricing matters more than the naming. Sonnet 5 launches at $2 per million input tokens and $10 per million output tokens through August 31, 2026, then moves to $3 and $15 respectively. Anthropic also says the updated tokenizer can map the same input to roughly 1.0-1.35x as many tokens depending on content type, so do not compare list prices alone. 1

For builders, the basic test is simple: take one messy repository task that currently needs Opus, GPT-5.6 Sol, or manual follow-through, then run it through Sonnet 5 at multiple effort settings. Anthropic says Sonnet 5 narrows the gap with Opus 4.8 on agentic search and computer-use evaluations, but the post's strongest claims are still vendor-side results. 1

GitHub also says Claude Sonnet 5 is available in GitHub Copilot for Pro, Pro+, Max, Business, and Enterprise users, including VS Code, Visual Studio, Copilot CLI, GitHub's cloud agent, JetBrains, Xcode, Eclipse, GitHub Mobile, and github.com. Enterprise and Business admins can enable it through model policy settings, and GitHub says it operates under Zero Data Retention like other Sonnet models in Copilot. 8

Copilot agents get closer to real app testing

GitHub moved browser tools for Copilot in VS Code to general availability. The feature lets agents open pages, navigate, click, type, hover, drag, handle dialogs, read page content, capture console errors, take screenshots, and run scripted flows when that is more efficient than repeated tool calls. 2

The control model is the part to inspect before relying on it. GitHub says tabs you opened are private until you select Share with Agent; tabs opened by the agent run in fresh sessions without your normal cookies or storage; and camera, microphone, and geolocation requests are denied by default. Admins get a dedicated switch plus site allow/deny controls. 2

Copilot CLI also gained auto model selection. GitHub says the CLI evaluates model availability, reliability, task type, reasoning need, code-generation complexity, bug-diagnosis difficulty, and tool-orchestration need before picking a model. Users can still switch back to a specific model with /model, and admin policies still apply. 3

The cost detail is worth noting. Auto currently uses models with 0x to 1x multipliers, and paid subscribers get a 10% discount on the model multiplier when auto mode selects a model. That is GitHub nudging developers toward routing by task rather than reflexively choosing the biggest model. 3

Google pushes generative media toward fast iteration

Google's Nano Banana 2 Lite is the new speed-and-cost member of its Gemini Image family. Google describes it as gemini-3.1-flash-lite-image, says it is available in Google AI Studio, the Gemini API, and Gemini Enterprise Agent Platform, and recommends it as the replacement for the older gemini-2.5-flash-image Nano Banana model. 4

The headline numbers: Google says Nano Banana 2 Lite can produce text-to-image outputs in 4 seconds and costs $0.034 per 1K image. It is aimed at rapid ideation, A/B creative variation, storyboarding, virtual try-on, and other high-volume pipelines where waiting on each render kills the workflow. 4

Gemini Omni Flash is the video side of the release. Google says it is now available to developers through Google AI Studio and the Gemini API, supports text, image, and video inputs, and is priced at $0.10 per second of video output. The limitations are important: current generations are 10 seconds, audio references and scene extension are not yet supported in the Gemini API, and video references up to 3 seconds are accepted by the API schema but not correctly processed by the model at this time. 4

If you are building with these models, bake in content provenance from the start. Google says Gemini Omni and Nano Banana 2 Lite use SynthID watermarking, and Google Cloud says C2PA content credentials and imperceptible SynthID watermarks are enabled by default for both models in its enterprise platform. 4 9

OpenAI's GeneBench-Pro is a benchmark design story

GeneBench-Pro is OpenAI's new benchmark for agentic computational biology. It has 129 questions across genomics, quantitative biology, and translational medicine, and OpenAI says each problem gives an agent a dataset, experimental context, and a target estimand tied to a downstream decision. In plain English: the model has to decide what analysis is valid, not just run a known script. 5

OpenAI says it built the problems synthetically so it knows the underlying causal structure and can grade deterministically. That avoids a common benchmark problem where two defensible analysis choices produce different answers and the grader rewards one arbitrary path. OpenAI also says it is open-sourcing 10 representative questions on Hugging Face and will provide a 50-question subset to Artificial Analysis for independent benchmarking. 5

The headline result is that GPT-5.6 Sol reaches a 28.7% pass rate at the highest reasoning level, or 31.5% with Pro mode enabled. OpenAI says reviewers estimated a typical problem would take a human expert 20-40 hours. Both claims should be read carefully: this is OpenAI evaluating its own models on a benchmark it built, and the human-time estimate is contextual rather than a measured head-to-head productivity study. 5

The builder lesson is broader than biology. Good agent benchmarks are moving toward messy inputs, tool use, hidden data issues, deterministic scoring, and explicit failure analysis. If your company is evaluating internal agents, GeneBench-Pro is a useful pattern to copy even if you never touch genomics.

NVIDIA shows where GPU data systems work gets concrete

NVIDIA's GQE post is a technical reference for GPU-accelerated SQL execution. GQE accepts Substrait query plans, uses cuDF for relational operators, uses nvCOMP for compression and decompression, and can use GPU memory, CPU memory, or disk-backed data sources. 6

The important engineering pattern is not simply "run SQL on the GPU." NVIDIA spends most of the post on data movement: columnar layout, row groups, compressed transfers, partition pruning with zone maps, pipelined host-to-device movement, and batched cudaMemcpyBatchAsync. On the 1 TB TPC-H scale experiment, NVIDIA says filter pruning skips 31% of data across all 22 queries and adds about 2.2 ms of overhead on average. 6

NVIDIA reports GQE running all 22 benchmark queries in 9.0 seconds on one B200 GPU in a GB200 NVL4 server, versus 74.0 seconds and 70.6 seconds for DuckDB on single- and dual-socket AMD Turin EPYC 9755 CPU configurations. It also states the results are derived from TPC-H but are not comparable to official TPC-H results because they do not comply with the TPC-H specification. 6

For builders, this is a reminder that AI infrastructure work often spills into ordinary data infrastructure. If retrieval, analytics, or feature computation is becoming the bottleneck, the interesting question may be how much data you can avoid moving before the GPU ever runs a kernel.

GitHub brings license checks into pull requests

GitHub's license-compliance post is not an AI story, but it belongs in a builder digest because dependency intake is part of open-source operations. GitHub says its License Compliance feature is available for GitHub Enterprise Cloud customers across repositories with an active GitHub Advanced Security Code Security license. The feature checks pull requests that add dependencies and flags packages whose licenses do not comply with organization policy. 7

The rollout pattern is practical. GitHub says its own OSPO started in Evaluate mode, which annotated pull requests without blocking merges, then moved toward a state where alerts mostly represented unusual, missing, or explicitly disallowed licenses. Developers can request policy exceptions, and reviewers can approve exceptions at enterprise or repository scope. 7

If you maintain internal packages or closed-source commercial products, this is a good place to reduce manual review load. Do not start by blocking every pull request. Seed a policy with common permissive licenses, run evaluate mode, measure false positives, and write down the exception path before turning enforcement on.

What to do with this

Try Sonnet 5 and Copilot's auto routing on tasks where you can measure completion cost, not just answer quality. For browser-driving agents, test against a staging app with fake credentials before connecting production workflows. For Nano Banana 2 Lite and Gemini Omni Flash, benchmark latency, retry rate, and moderation/provenance behavior as product constraints, not as afterthoughts.

For infrastructure teams, the GQE post and GitHub license workflow point in the same direction: the highest leverage may be in plumbing. Better transfer paths, better metadata, better policy checks, and better routing can matter as much as another model upgrade.