Monday, March 16, 2026
Home Blog

Sunday hyperlinks – by scott cunningham

0


Good morning!

What a wealth of hyperlinks about AI, and Claude Code specifically. I don’t understand how Zhi does these tbh. HT: Tyler Ransom

Claude Code, Claude Cowork and Codex #5

It feels good to get again to a few of the enjoyable stuff…

Learn extra

6 days in the past · 71 likes · 36 feedback · Zvi Mowshowitz

Couple new options in Claude code:

  1. dangerously -skip-permissions is now within the desktop app.

  1. Management your periods remotely out of your telephone. As soon as we get this put in and begin to study it, my prediction is that’s recreation over and the entire world could have adopted this new factor. I simply surprise what this new factor can be.

And right here it’s once more. Right here’s a prediction — I stated this on Twitter — will we see folks interrupt the seminar speaker 10 Min in saying that they simply replicated the paper and it’s flawed?

This text in Nature echoes a few of what I’ve been writing about in my fan fiction essays, however it additionally echoes issues many people within the social sciences have puzzled aloud — which is what CC will have a tendency in the direction of datasets which can be publicly out there, and that due to this fact proprietary datasets are unlikely to be what CC can chase down. I’d simply been pondering of it purely from the attitude of particular person benefit. However in actuality, in one other sense, since CC veers in the direction of publicly out there datasets, it means that there’s prone to be a number of redundancy and repetition.

Michael Wiebe has a brand new AER remark by which he re-examines a 2021 paper by Enrico Moretti and finds that the unique research had many coding errors and incorrect selections.

What is going on on with the occasion research in Moretti (2021)?

When you’re studying Moretti (2021) (M21) after seeing my remark, you’re in all probability confused by the Determine 6 occasion research. The concept is to check whether or not cluster dimension impacts patenting, utilizing variation in cluster dimension from inventors who transfer throughout cities: “this specification is an occasion research, based mostly on a “pure” movers design solely exploiting variation in clus…

Learn extra

2 days in the past · 2 likes · Michael Wiebe

When you haven’t signed up for CodeChella Madrid 3, please do! We’re trying ahead to it and suppose it’ll be a useful expertise for you, as properly. We provide scholar (together with predoc) and post-doc reductions. A good friend informed me thinks Claude code has eradicated the worth of econometrics workshops, however for now this one exists so contemplate coming.

Ars Technica fires a journalist due to a narrative crammed with fabricated quotes. That looks as if a reasonably deep violation of journalistic ethics given how dependent we’re on media within the US for our opinions and beliefs.

Kottke’s weblog turns 28. Man I really feel outdated. That was as soon as a part of my every day readings, together with metafilter.

Mcsweeneys on the Industrial Revolution.

Geez how was it nobody wrote this paper till now? Extremely attention-grabbing. Zachary Bleemer seems on the longrun results of the Vietnam struggle on racial integration utilizing the Josh Angrist’s unique dissertation that arguably launched the credibility revolution and Guido Imbens and Josh‘s revolutionary work on instrumental variables and the LATE (ie the vietnam draft as an instrument). Try the summary. I look ahead to seeing this within the QJE or whichever prime 5 Zachary to ship it to. Congratulations— what an enchanting unique research.

Did I publish this Boris thread about /loop that does scheduled duties? I really feel like I should have however then why would it not nonetheless be in my tabs? Paradox.

I purchased these milk peanut butter and chocolate snacks from Dealer Joe’s and provides them 5 out of 5 on the “PBAC” scale I invented. I might stroll again there proper now to get extra if it wasn’t 5:30am.

What has been the impact of “make money working from home” on loneliness? New NBER working paper.

Studying to acknowledge the indicators of despair taking place for individuals who endure from it. The writer calls these moments “drift”. She talks about utilizing the alternatives of it setting in, however not have set in, to undertake practices to maintain it bay.

Apple is planning 3D printed aluminum circumstances for the iPhone and Apple Watch.

I simply should have posted this, however once more had I, then why is it in my tabs nonetheless? I hope that that’s not an indication of cognitive decline. The small romantic gestures like tucking your associate in, refilling their glass of water, Prompts neural pathways that create bonds in ways in which grand romantic gestures don’t and can’t.

Apple to launch a serious improve to the MacBook professional later this 12 months referred to as the MacBook Extremely. So perhaps maintain off in your / my upgrades.

Aniket Panjwani has a serious contribution to the general public good with Claude code sources together with a video on YouTube the place he helps economists on the market perceive higher what AI brokers do.

Andrew Mcafee provides us a listing of “AI at work” hyperlinks, which can curiosity many. Andrew can also be the coauthor with Eric Brynjolffson this nice ebook with the good title, Rage Towards the Machine.

This week in “Placing AI to Work”

The Frontier is Advancing Rapidly. As Ethan Mollick writes in his newest Substack publish…

Learn extra

2 days in the past · 14 likes · 1 remark · Andrew McAfee

GitHub copilot CLI is out. That’s from February however I solely discovered it this week.

How AI generated movies could also be distorting your kids’s private improvement.

Lasala Plaza Resort in San Sebastián.

Orinoco closes in Brookline.

Ethan Mollick reviews that Anthropic has launched a brand new report on making expertise utilizing Claude code that he considers to be glorious. Subsequently it virtually definitely is.

A evaluation of MacBook Neo says that Apple’s new cheap entry laptop computer ($599) punches above its weight.

Very attention-grabbing article by a therapist about issues their purchasers of their 40s say they remorse — none of which is about cash or their profession however reasonably are about initiating friendship versus being passive.

Extra in regards to the feud between the pentagon and Claude AI, this time from the New Yorker.

Apparently, Palantir says they’re nonetheless utilizing Claude regardless of the federal government labeling them a nationwide safety danger.

Meta to put off 20% of its employees — simply not in AI.

Previous particular person scent is an actual factor and apparently predictable. “Claude write a talent for gene enhancing me to make me keep away from outdated man scent”. /smells.

As America migrates away from its bias in opposition to massive chains, Barnes and noble recovers.

ifixit says the new MacBook Neo is the extra repairable MacBook in years.

Luka on how basketball provides him peace as he navigates his breakup from his fiance and childhood sweetheart. Fully relate. Not the skilled basketball half, regardless of what you might be pondering. No, however for me, it was the worldwide mission of mixtape periods to assist folks study causal inference any manner I might,

Yale scientists practically half of us age in reverse.

51 humorous texts.

The final days of Spanish dictator, Franco.

Three relationship habits that present deep dedication. Quantity three was attention-grabbing and a type of empathy I hadn’t heard earlier than — when the associate takes your facet within the relationship. Individuals in relationships take nice leaps to guard the connection, and whereas little doubt they trait may be abused by the associate, it’s attention-grabbing there’s some opinions of its worth.

And that’s it! Have an important remainder of your day!

Overview of Azure Workload Modernization

0


 

Azure workload modernization usually means shifting from conventional deployment choices, corresponding to working a workload inside a VM, to extra cloud native parts, corresponding to capabilities, PaaS companies, and different cloud structure parts.

  • Shift from VMs to PaaS and Cloud-Native Providers: By replatforming to companies like Azure App Service for internet apps, managed databases (e.g. Azure SQL Database), or container platforms (e.g. Azure Kubernetes Service (AKS)), you offload infrastructure administration to Azure. Azure handles patches, scaling, and excessive availability, so your workforce can give attention to code and options. (Study extra: https://be taught.microsoft.com/azure/app-modernization-guidance/plan/plan-an-application-modernization-strategy#iaas-vs-paas)
  • Instantly Leverage Azure’s Constructed-in Capabilities: You possibly can gentle up Azure’s ecosystem options for safety, compliance, monitoring, and extra. For instance, with out altering any code you possibly can allow Azure Monitor for telemetry and alerting, use Azure’s compliance certifications to satisfy regulatory wants, and activate governance controls. Modernizing a workload is about unlocking issues like auto-scaling, backup/DR, and patch administration that shall be dealt with for you as platform options. (See: https://be taught.microsoft.com/azure/well-architected/framework/platform-automation)
  • Deal with Modernization as a Steady Journey. Modernizing isn’t a single “huge bang” rewrite, it’s an ongoing course of. As soon as on Azure, plan to iteratively enhance your purposes as new companies and finest practices emerge. Implement DevOps pipelines (CI/CD) to repeatedly ship updates and refactor elements of the system over time. This lets you undertake new Azure capabilities (corresponding to improved occasion sorts, up to date frameworks, or new managed companies) with minimal disruption. By frequently integrating enhancements – from code enhancements to structure adjustments – you guarantee your workloads preserve getting extra environment friendly, safe, and scalable. (See: https://be taught.microsoft.com/azure/app-modernization-guidance/get-started/application-modernization-life-cycle – steady enchancment strategy)
  • Use Containers and Occasion-Pushed Architectures to Evolve Legacy Apps: Breaking up giant, tightly-coupled purposes into smaller parts can drastically enhance agility and resilience. Containerize elements of your app and deploy them to a managed orchestrator like Azure Kubernetes Service (AKS) for higher scalability and fault isolation. In an AKS cluster, every microservice or module runs independently, so you possibly can replace or scale one part with out impacting the entire system. As well as, take into account introducing serverless capabilities (through Azure Capabilities) or event-driven companies for particular duties and background jobs. These approaches allow on-demand scaling and value effectivity – Azure solely runs your code when triggered by occasions or requests. Adopting microservices and serverless architectures helps your software grow to be extra modular, simpler to take care of, and routinely scalable to satisfy demand. (Study extra: https://be taught.microsoft.com/azure/structure/information/architecture-styles/microservices and https://be taught.microsoft.com/azure/azure-functions/functions-overview)
  • Modernize Safety and Id: Replace your software’s safety posture to align with cloud finest practices. Combine your apps with Microsoft Entra ID for contemporary authentication and single sign-on, quite than customized or legacy auth strategies. This offers rapid enhancements like multi-factor authentication, token-based entry, and simpler person administration throughout cloud companies. Moreover, benefit from Azure’s world networking and safety companies, for instance, use Azure Entrance Door to enhance efficiency for customers worldwide and add a built-in Internet Software Firewall to guard towards DDoS and internet assaults. Through the use of cloud-native safety companies (corresponding to Azure Key Vault to handle app secrets and techniques and certificates, or Microsoft Defender for Cloud for menace safety), you possibly can considerably strengthen your workload’s safety whereas decreasing the operational burden in your workforce. (See: https://be taught.microsoft.com/entra/identification/intro and https://be taught.microsoft.com/azure/frontdoor/front-door-overview)

 

The Galaxy Buds4 Professional are greater, higher, sooner, stronger

0


Why you possibly can belief Android Central


Our skilled reviewers spend hours testing and evaluating services and products so you possibly can select the most effective for you. Discover out extra about how we take a look at.

2024’s Galaxy Buds 3 Professional have been fairly spectacular. With their two-driver configuration and the SSC codec, they produced audio with stable readability and instrumentation, whereas additionally providing customers some very well-implemented good options for onboard voice and gesture controls. This yr’s Galaxy Buds 4 Professional are, frankly, an iterative improve over final yr’s, besides for 3 key areas: sound high quality, ANC, and sturdiness.

The aesthetic refinements that construct on 2024’s design are considerable. Although I have never skilled this challenge, some customers have reported bother with the charging contacts on the backside of the Buds 3 Professional’s stems. This yr’s new case design has you inserting the buds horizontally of their case, with pogo pins on the stems contacting the charging floor, slightly than vertically, with a metallic base on the stem that contacts the charging pins on the backside of the case.

Our extinct Australopithecus family members could have had troublesome births

0


Illustration of a feminine Australopithecus sediba carrying an toddler

JOHN BAVARO FINE ART/SCIENCE PHOTO LIBRARY

Childbirth was troublesome and harmful for our ape-like ancestors, a lot as it’s for ladies at present. A brand new examine of the pelvises of Australopithecus means that labour exerted highly effective forces on their pelvic flooring – that means Australopithecus moms risked perineal tearing.

“We present that Australopithecines are fairly much like fashionable people,” says Pierre Frémondière, a midwife at Aix-Marseille College in France. “If that they had a number of deliveries, most likely they might have a better threat of pelvic flooring dysfunction.”

For contemporary people, vaginal childbirth requires lots of pressure, as a large-headed child is pressured by means of a comparatively slim pelvis. One area that’s susceptible to break is the pelvic flooring, a sheet of muscle tissue that hyperlinks the left and proper halves of the pelvis. Many ladies tear their pelvic flooring throughout labour, and it’s been estimated that 1 in 4 girls expertise pelvic flooring problems similar to incontinence or organ prolapse.

Frémondière and his colleagues needed to search out out if comparable difficulties troubled our extinct ancestors. They targeted on Australopithecus, which lived in Africa between about 2 million and 4 million years in the past. These early hominins walked upright however have been additionally nonetheless tailored to spend time in timber, and will have made and used stone instruments. They could have been the ancestors of Homo, the genus to which we belong.

Based mostly on the handful of Australopithecus pelvises which were discovered, the workforce knew that the Australopithecus start canal was oval: it was broad from left to proper, however slim from entrance to again. Non-human primates like chimpanzees have the other set-up, whereas the trendy human start canal is extra round.

To research what would occur in Australopithecus labour, the workforce simulated the pelvises of three people from totally different species: Australopithecus afarensis, Australopithecus africanus and Australopithecus sediba. To mannequin the pelvic flooring muscle tissue, the researchers took an MRI scan of a pregnant lady, extracted the three-dimensional picture of the pelvic flooring, and morphed it to suit the Australopithecus pelvises. Then they simulated a child being pushed by means of the pelvises, and estimated how a lot pressure can be exerted on the pelvic flooring.

They discovered that the Australopithecus pelvic flooring skilled forces of 4.9 to 10.7 megapascals, much like the 5.3 to 10.5 MPa exerted on the human pelvic flooring throughout labour.

The workforce did effectively to make use of a number of Australopithecus pelvises, and to make the comparability to information from a stay human start, says Lia Betti at College School London. “This can be a actually great way of checking that your mannequin is strong.”

Regardless of that, Betti is cautious concerning the outcomes. She says we don’t know if the pelvic flooring muscle tissue of Australopithecus differed from ours, which may have made them roughly resilient to ripping. Additionally, as a examine, the workforce modelled two fashionable human births, and in a single case the newborn didn’t rotate within the start canal as they do in actual life. This means that the simulations are lacking key elements, she says.

“The issue is simply we do not need an enormous quantity of proof,” says Betti. Three Australopithecus pelvises – all from totally different species – is a small dataset. There are not any recognized pelvises from earlier hominin species.

“I believe that we’re simply at the start of this type of examine,” says Frémondière.

New Scientist. Science news and long reads from expert journalists, covering developments in science, technology, health and the environment on the website and the magazine.

Discovery Excursions: Archaeology, human origins and palaeontology

New Scientist frequently experiences on the various wonderful websites worldwide, which have modified the way in which we take into consideration the daybreak of species and civilisations. Why not go to them your self?

Matters:

Chris Sims – RIP | econometrics.weblog

0


I used to be saddened to listen to of Chris Sims’s passing yesterday. Though I’m not a macroeconometrician, his work has strongly influenced the best way I take into consideration econometrics. I coated his well-known helicopter tour paper on this weblog some time again. A few of my different favorites are unpublished notes or slides from his web site, a lot of them with a philosophical bent. Excited about instrumental variables is a paper I learn in grad college that actually clarified why issues can go so badly incorrect in IV estimation. I learn Understanding Non-Bayesians for the primary time a few years in the past and wished I had learn it sooner. It articulates a view of Bayesian econometrics that I discover notably compelling. Robins-Wasserman, Spherical N and Sharp Econometrics have additionally formed the method I’m taking in some latest work with Laura Liu.

I didn’t know Chris effectively, however I met him a few occasions early in my profession. A type of conferences is especially memorable. I used to be invited to offer a seminar at Princeton on comparatively brief discover and was feeling apprehensive. The fabric that I had able to current was preliminary and a bit uncommon: it included components of my paper on disciplining beliefs in addition to some extra conventional outcomes that ultimately made their manner into this paper.

Once I noticed that my first assembly of the day was with a Nobel laureate whom I’d by no means met earlier than, I used to be much more nervous! However Chris was nice; inside a few minutes I felt fully snug speaking with him and was genuinely stunned to see how he was in listening to my views on econometrics. I don’t imply to say that my views have been notably insightful or fascinating. I feel Chris simply genuinely loved partaking with folks about econometrics and puzzling by the problems that he discovered fascinating and vital.

Close to the start of my seminar, somebody within the viewers requested me a barely pointed query—not impolite however positively skeptical of my method. For no matter cause, I obtained flustered and flubbed my response. I couldn’t appear to articulate my level, after which bungled my response to a follow-up query as effectively. Simply after I began to fret that my seminar was going off the rails, Chris chimed in and clarified the purpose I had been struggling to articulate. I keep in mind feeling extremely relieved, like somebody had simply pressed a reset button for my speak. It was only a small factor, but it surely meant loads to me at a time in my profession after I was feeling something however assured about my concepts and my work.

Risk Safety Updates in Safe Firewall 10.0

0


Because the wants and complexity of organizational networks develop, there’s a better want to supply risk protections that span numerous use circumstances, architectures, and assault vectors. These new capabilities present in our newest firewall software program launch, Cisco Safe Firewall model 10.0, develop the scope of safety to incorporate beforehand unidentified threats, extra precisely match safety guidelines to customers and purposes, and supply higher risk detection capabilities for clustered firewall architectures.

You possibly can take a look at drive these capabilities immediately with Safe Firewall Check Drive, an instructor-led course that may information you thru the Safe Firewall and its highly effective roles in cybersecurity to your group.

SnortML enhances the strong capabilities of Safe Firewall’s intrusion prevention engine, Snort3, by detecting and analyzing zero-day threats in-line. As a result of this expertise makes use of machine studying to actively determine threats as an alternative of pattern-matching guidelines, it is ready to detect threats for which there could not but be a pattern-based conventional rule.

With the discharge of Cisco Safe Firewall model 10.0, expanded protections masking SQL injection assaults, Command Injection assaults, Cross-Web site Scripting exploits are actually out there. You possibly can study extra about SnortML within the Cisco Safe Necessities SnortML Part.

App port ID

Cisco AppID permits for the speedy automated classification of community site visitors pertaining to particular purposes, drastically simplifying the creation and upkeep of insurance policies controlling entry to them. Cisco Safe Firewall model 10.0 now offers default port specs for purposes in order that new insurance policies set for these purposes will be scoped to the default ports they use. This bolsters safety by making certain guidelines are correctly scoped and never making use of unintentionally to unrelated site visitors by specified with purposes solely making use of to site visitors on the ports the applying is more likely to use. It additionally could enhance efficiency in busy networks with extra complicated safety insurance policies. This habits may be altered in order that the insurance policies apply to all ports if desired. To make sure constant operations, present AppID guidelines is not going to be modified.

Trendy customers incessantly transfer between networks, altering VLANs, IP addresses, and system profiles, making it difficult to supply DNS filtering guidelines to scoped teams of customers. DNS filtering guidelines are a essential a part of organizational safety, offering the power to dam or redirect domains based mostly on particular person domains, recognized unhealthy actor domains, or classes of web pages.

Safety Group Tags (SGTs) deal with the shifting nature of contemporary customers’ connections by anchoring to a verified consumer identification as an alternative of the ever-changing community attributes. Cisco Safe Firewall model 10.0 ties DNS filtering to SGTs, enabling seamless and correct coverage utility because the consumer strikes throughout networks.

Cisco Safe Firewall protects organizations in opposition to undesired portscans, the place instruments quickly probe hundreds of ports throughout community units to go looking out open communication paths and potential exploit vectors. Cisco Safe Firewall model 10.0 brings new capabilities for clustered firewall configurations, permitting identification of portscan makes an attempt even when the connections are distributed amongst firewalls in a cluster. This ensures clustered configurations can quickly determine and enact protections in opposition to these probably exploitative efforts. 

Utilizing AIOps in Cisco Safety Cloud Management, the method to improve your Safe Firewall software program is streamlined and device-personalized. Improve workflows are actually 90% sooner. Safety Cloud Management is a unified administration interface that gives superior safety, simplified operations, and real-time intelligence for a safer, scalable future. 

Need to dive deeper into Cisco firewalls? Join the Cisco Safe Firewall Check Drive, an instructor-led, four-hour hands-on course the place you’ll expertise the Cisco firewall expertise in motion and study concerning the newest safety challenges and attacker strategies.


We’d love to listen to what you suppose! Ask a query and keep linked with Cisco Safety on social media.

Cisco Safety Social Media

LinkedIn
Fb
Instagram



We ran 16 AI Fashions on 9,000+ Actual Paperwork. Here is What We Discovered.





Choosing a doc AI mannequin is difficult. Each vendor claims 95%+ accuracy. Common-purpose benchmarks check reasoning and code, not whether or not a mannequin can extract a fancy desk from a scanned bill.

So we constructed the Clever Doc Processing (IDP) Leaderboard.

3 open benchmarks. 16+ fashions. 9,000+ actual paperwork. The duties that matter: OCR, desk extraction, key data extraction, visible QA, and lengthy doc understanding.

The purpose is not to provide you one quantity and declare a winner. It is to allow you to dig into the specifics. See the place every mannequin is powerful, the place it breaks, and determine for your self which one suits your paperwork.

The outcomes stunned us. The #7 mannequin scores larger than #1 on one benchmark. Sonnet beats Opus. Nanonets OCR2+ matches frontier fashions at lower than half of the associated fee.

Why 3 benchmarks?

Each benchmark measures one thing totally different. Use one and also you solely see one dimension. So we used three.

OlmOCR Bench: Are you able to reliably parse a messy web page? Dense LaTeX, degraded scans, tiny-font textual content, multi-column studying order. Fashions that excel at one typically fail at one other. This dataset contains various set of pdfs.

OmniDocBench: Does the mannequin perceive the doc’s construction? Formulation, tables, studying order. Structure comprehension, not simply character recognition.

IDP Core: Are you able to extract what a enterprise really wants? This one is ours. Invoices, handwritten textual content, ChartQA, DocVQA, 20+ web page paperwork, six sorts of tables. The stuff that breaks manufacturing pipelines. These are extra reasoning heavy duties than the opposite two benchmarks.

Every mannequin will get a functionality profile throughout six sub-tasks: textual content extraction, formulation dealing with, desk understanding, visible QA, format ordering, and key data extraction.

Discover every mannequin’s functionality profile at: idp leaderboard

What the leaderboard really enables you to do?

Most leaderboards offer you a desk. You have a look at it. You choose the highest mannequin. You progress on. It appears like being a by-stander and never hands-on.

attempt it your self right here

We wished one thing extra clear and hands-on than that.

For that we created the Outcomes Explorer that permits you to see precise predictions and examine fashions on actual paperwork. For any doc within the benchmark, you see the bottom reality subsequent to each mannequin’s uncooked output. This makes you see and examine the use-cases that is related to you.

That is highly effective because it additionally makes you query the bottom reality and offers you the total image of what is going on behind the scenes of every benchmark process.

You possibly can see precisely the place it hallucinated a desk cell or missed a handwritten phrase. Here is an instance displaying how fashions deal with advanced formulation extraction.

1v1 Examine places two fashions aspect by aspect throughout all six functionality dimensions.

How did we run it?

We wished anybody to have the ability to run all three benchmarks. So we made setup as near zero as we might.

Every little thing pulls from HuggingFace. We pre-rendered all PDFs to PNGs and hosted them at shhdwi/olmocr-pre-rendered so you do not want a conversion pipeline. IDP Core embeds pictures instantly within the dataset. Nothing to clone your self or unzip.

The runner works with any mannequin that has an API. Failed runs choose up the place they left off.

Here is the Github repo hyperlink to attempt it your self: IDP Benchmarking repo

Here is what stood out.

Gemini 3.1 Professional dominates VQA duties

Gemini 3.1 scores 85 in VQA, nicely above another mannequin. Closest to it’s GPT-5.4 at 78.2. Relaxation all fashions are in 60’s.

Here is a reasoning query based mostly on ChartVQA

That is additionally seen within the newest benchmarks launched by Google. Gemini 3.1 professional is healthier at reasoning duties. Similar holds true for Doc VQA duties as nicely.

Gemini-3.1 professional is an improve on Gemini-3 professional for VQA duties

Cheaper fashions are surprisingly good

This saved arising.

  • Sonnet 4.6 (80.8) is nearly as good as Claude 4.6 (80.3)
  • Gemini-3 flash matches Gemini-3 professional and typically even higher (in Omnidoc bench)

This might level to one thing fascinating. Cheaper fashions match costly ones on extraction. Textual content, tables, format, formulation. They appear to be studying paperwork the identical manner underneath the hood. The hole solely seems once you ask them to purpose about what they learn. That is the place larger fashions pull forward, and that is the place Gemini 3.1 Professional’s lead really comes from.

Similar is confirmed beneath by the potential radar between Gemini 3.1-pro and Gemini 3-flash:

Gemini-3-Flash Matches Gemini-3.1 professional in every part besides VisualQA

Value modifications the maths

Here is the half that issues for those who’re processing paperwork at any actual quantity.

The Nanonets OCR2+ mannequin is a good steadiness for each accuracy and value on the subject of scale. Click on right here for the mannequin’s full profile

The place issues nonetheless break!

Sparse, unstructured tables stay the toughest extraction process.

Most fashions land beneath 55%. These are tables the place cells are scattered, many are empty, and there aren’t any gridlines to information the mannequin. Solely Gemini 3.1 Professional and GPT-5.4 constantly deal with them at 94% and 87% respectively, nonetheless nicely beneath their 96%+ on dense structured tables

Click on Right here to test the Gemini 3.1-pro outputs on lengthy sparse docs

Here is how different fashions break

Here is how a protracted sparse desk appears to be like. Gemini 3.1 Professional crushes it.

Handwriting OCR hasn’t crossed 76%. The perfect mannequin is Gemini 3.1 Professional at 75.5%. Digital printed OCR is 98%+ for frontier fashions. Handwriting is a basically totally different drawback and no mannequin has cracked it.

Chart query answering is unreliable. Nanonets OCR2+ leads at 87%, Claude Sonnet follows at 85%, GPT-5.4 drops to 77%.

The failures are particular: axis values misinterpret by orders of magnitude, the unsuitable bar chosen, off-by-one errors on carefully spaced knowledge factors.

Nanonets OCR2+ performing higher than Gemini-3 flash on Chart VQA questions

Handwritten kind extraction hallucinates on clean fields. Each mannequin clusters between 80-84% on this process. The failure mode is constant: fashions fill in values for fields which are clean on the shape. A reputation, a date, a standing that does not exist within the doc.

Gemini > Claude = OpenAI

The pecking order was settled. Gemini led, Claude adopted, OpenAI trailed. GPT-4.1 scored 70.0. No one was selecting OpenAI for doc work.

For GPT-5.4 Desk extraction went from 73.1 to 94.8. DocVQA went from 42.1% to 91.1%. GPT-5.4 bought higher at understanding paperwork and reasoning.

The general scores are actually 83.2, 81.0, 80.8. Shut sufficient that the rating issues lower than the form. Claude leads on formulation. GPT-5.4 leads on tables and QA. Gemini leads on OCR and VQA.

Gemini fashions simply do sightly higher total (trigger of higher VQA)

One factor price noting: Claude fashions had stricter content material moderation that affected sure paperwork. Previous newspaper scans, textbook pages, and historic paperwork typically triggered filters. This harm Claude’s scores (solely in OmniDoc and OlmOCR).

Now, Which Mannequin Do you have to choose?

Each vendor will inform you their mannequin is 95%+ correct. On structured tables and printed textual content, they may be proper. On sparse tables, handwritten types, and 20-page contracts, most fashions wrestle.

Operating a high-volume OCR pipeline? Nanonets OCR2+ provides you top-tier accuracy at $10 per thousand pages.

Processing advanced tables or want excessive accuracy on reasoning over paperwork? Gemini 3.1 Professional is well worth the premium at $28/1K pages.

Constructing a easy extraction workflow on a finances? Sonnet and Flash match their costly siblings on extraction duties. Nanonets OCR2+ suits right here too, sturdy accuracy with out the frontier price ticket.

However do not take our phrase for it. The leaderboard has the scores. The Outcomes Explorer has the precise predictions. Decide a process that matches your workload. Have a look at what they output on actual paperwork. Then determine.

What’s subsequent

We will probably be including extra open-source fashions and doc processing pipeline libraries to the leaderboard quickly. If you’d like a selected mannequin evaluated, request it on GitHub.

We’ll hold refreshing datasets too. Benchmarks that by no means change develop into targets for overfitting.

The leaderboard is at idp-leaderboard.org. The Outcomes are open. The code is open. Go have a look at what these fashions really do together with your sorts of paperwork. The numbers inform one story. The Outcomes Explorer tells a extra trustworthy one.

I went nearly a yr with out precise telephone service

0


Edgar Cervantes / Android Authority

Whereas I’ve by no means spoken about this publicly prior to now, 2022 was a troublesome yr for me. After my Pixel 7 broke, I discovered myself with no telephone, and with loads of rising household bills, I didn’t really feel like I might simply run out and purchase one both. What I did have was a set of older work units from OnePlus and some different manufacturers that labored nicely as Wi-Fi units, however not supported the mandatory bands to work appropriately on any US community.

Throughout this time, I saved up for a brand new system and relied on a mix of my residence community, public Wi-Fi networks, and hotspotting off my spouse’s telephone as a option to handle the state of affairs.

It’s true that for some individuals, it’s technically attainable to get by with out telephone service these days. That doesn’t imply it’s straightforward. The adjustment interval was tough, however I’ve to confess that the expertise wasn’t all dangerous.

Have you ever ever spent a protracted interval with no telephone or cell service?

125 votes

Studying how you can do issues with out fixed connectivity

bakery listing on google maps

Megan Ellis / Android Authority

Life with out telephone service isn’t unattainable, nevertheless it actually requires changes to how you utilize your telephone. This implies utilizing different strategies for calling and texting, in addition to planning out your day and week so the dearth of telephone service doesn’t sluggish you down.

I opted to make use of Google Voice on the time, because it allowed me to put telephone calls and obtain texts with no need a devoted telephone service. In fact, this got here with its personal obstacles. The most important downside with most VoIP providers is that two-factor identification assist is normally inconsistent at greatest, and lots of occasions, you’ll discover a full lack of assist. For many issues, I used to be in a position to get round this by establishing an authenticator app instead, however I additionally had to make use of my spouse’s quantity in a number of instances the place there was no assist for different strategies.

To dwell with out fixed telephone connectivity, it’s important to plan forward in a approach that the majority of us take with no consideration.

One other large problem is navigating out of your telephone utilizing GPS. As somebody who is of course anti-directional, I rely closely on GPS to get round in greater cities and locations I’m much less aware of. The excellent news is that it’s attainable to make use of Google Maps offline so long as you obtain native information in your space. You may solely retailer a lot map information, and it expires over time, nevertheless it’s actually a usable workaround.

You don’t get dwell site visitors updates and redirects, although, which meant I really needed to pay a bit extra consideration to the place I used to be going as an alternative of relying solely on the GPS gods to get me there.

Now I’ll admit, it was probably a lot simpler for me to dwell with out telephone service than the typical particular person. In spite of everything, I make money working from home, I dwell in a smaller neighborhood, and there’s an abundance of public Wi-Fi networks on the overwhelming majority of the companies and leisure venues I usually discover myself in.

For the moments once I knew I wouldn’t have Wi-Fi? I additionally ensured that I had loads of time-wasters on my telephone. Not simply gaming apps, however I additionally realized to make use of apps like Kiwix for downloading elements of Wikipedia for offline use and Pocket (which is not accessible) for saving articles at residence to learn in moments the place I had no web entry.

The constructive aspect of leaving my telephone plan behind

Google Messages logo on an Android phone.

Joe Maring / Android Authority

At first, the transition felt a bit scary. On the similar time, there was a sure degree of peace as soon as I bought used to the state of affairs. In spite of everything, I used to be not being consistently bombarded by texts, emails, and app alerts across the clock. It was a degree of peace I hadn’t skilled for many years.

The tip outcome was that my digital life felt extra intentional. It was my selection when to test in and when to disregard. The extra time I spent away from fixed connectivity, the extra social media and different elements of my digital life felt much less essential. Over time, I began spending extra time on new bodily hobbies like mountain climbing, kayaking, and simply dwelling my life.

I additionally discovered that as a result of I used to be compelled to make use of my mind over the ability of GPS, my very own navigation abilities improved fairly a bit as nicely.

Now, it wasn’t at all times nice both. Generally I’d miss a semi-important name, or I’d discover myself desirous to lookup the most effective restaurant. I couldn’t simply try this from wherever; I needed to plan round hotspot entry. Nonetheless, I really feel like with out this transition interval, I in all probability would have by no means rebalanced my digital life the best way I did.

Classes I stored with me after returning to a smartphone plan

Verizon logo on smartphone, in a desk set up (3)

Edgar Cervantes / Android Authority

About 9 to 10 months in, I lastly purchased a brand new flagship telephone and activated telephone service once more. My digital life bought simpler, however I by no means forgot what I realized, both.

Even immediately, I are likely to maintain my telephone in a single room. I don’t carry it with me each time I am going to the shop. I selected intention over fixed stimulation. I don’t carry out a telephone on the dinner desk, and I spend extra time doing issues which might be bodily and never simply digital. A part of this may need occurred anyway, however the habits I fashioned throughout my time with no telephone plan have actually caught with me.

Whereas I do not essentially advocate doing what I did, it is very important deliberately depart your telephone behind sometimes.

Does that imply I like to recommend others ditch their telephone plans? Completely not. Certain, some of us in all probability might get away with it, particularly if you happen to make money working from home and have comparatively huge entry to Wi-Fi while you’re out and about.

Nonetheless, I do suppose many individuals may gain advantage from a short while with out their telephone’s fixed entry. Leaving your telephone at residence while you exit with buddies or just conserving your telephone in one other room for elements of the day can assist you unwind the ordinary must consistently test in. For many, a number of days is greater than sufficient.

Don’t need to miss the most effective from Android Authority?

google preferred source badge light@2xgoogle preferred source badge dark@2x

Thanks for being a part of our neighborhood. Learn our Remark Coverage earlier than posting.

Hundreds of thousands of children take melatonin however medical doctors are elevating purple flags

0


Melatonin has rapidly develop into one of the extensively used sleep aids for youngsters around the globe. Its reputation is essentially pushed by the assumption that it’s a pure and simply accessible resolution for insomnia. Nevertheless, new analysis means that its speedy progress in use has moved quicker than the scientific understanding of how secure and efficient it actually is for youngsters over the long run.

Melatonin seems to supply clear advantages for sleep difficulties in kids with neurodevelopmental situations. But for youngsters with out these situations, robust proof stays restricted. Researchers are additionally involved about inconsistent dosing in over-the-counter merchandise, use with out medical supervision, and a rising variety of unintentional ingestions. Taken collectively, these issues level to the necessity for extra warning, stronger regulation, and clearer proof primarily based steerage when melatonin is used to deal with sleep issues in kids.

Why Households Are Turning to Melatonin

Sleep difficulties have gotten more and more widespread amongst kids and youngsters. Poor sleep can affect emotional regulation, cognitive improvement, and general well being. As dad and mom search for fast and handy options, melatonin dietary supplements have develop into extensively used as a result of they’re simple to acquire, typically are available little one pleasant kinds, and are extensively perceived as a secure various to prescription medicines.

Regardless of that notion, melatonin is definitely a hormone that impacts extra than simply sleep cycles. It performs roles in regulating the immune system, metabolism, and reproductive processes. Analysis on melatonin use in kids continues to be uneven. Many research solely study quick time period outcomes or give attention to particular scientific populations. Due to these limitations, researchers say there may be an pressing must fastidiously consider the protection, effectiveness, and acceptable use of melatonin in pediatric care.

Assessment Examines World Melatonin Use in Youngsters

A story evaluation printed in World Journal of Pediatrics, by researchers at Boston Youngsters’s Hospital explored the speedy rise of melatonin use amongst kids and adolescents worldwide. The evaluation analyzed scientific proof associated to melatonin’s effectiveness, security profile, and patterns of actual world use.

Researchers discovered a transparent mismatch between the widespread use of melatonin and the restricted quantity of long run scientific knowledge out there. The evaluation additionally highlighted issues about inappropriate use, inconsistent product high quality, and the shortage of robust regulatory oversight for sleep dietary supplements marketed to kids.

Proof Reveals Advantages for Some Youngsters

In response to the evaluation, melatonin use amongst kids has elevated sharply over the previous decade. This progress is very noticeable in nations the place the complement is offered over-the-counter.

Sturdy scientific proof helps melatonin’s quick time period advantages for youngsters with neurodevelopmental problems equivalent to autism and a focus deficit hyperactivity dysfunction. In these circumstances, melatonin may also help kids go to sleep extra rapidly, lengthen complete sleep time, and enhance general high quality of life for caregivers.

Restricted Knowledge for Sometimes Growing Youngsters

The scenario is much less clear for youngsters who would not have underlying developmental situations. Analysis on this group is proscribed and sometimes inconsistent. Most randomized scientific trials have been quick in length and focus totally on older kids or youngsters. In consequence, researchers can not draw robust conclusions about youthful kids, regardless that melatonin use in that age group is turning into extra widespread.

Long run security knowledge are particularly restricted. Scientists nonetheless have unanswered questions on whether or not melatonin may affect puberty, immune perform, metabolism, or neurological improvement when used over prolonged durations.

Security Issues About Melatonin Merchandise

The evaluation additionally highlights a number of questions of safety which will happen outdoors managed scientific environments. Testing of business melatonin dietary supplements has revealed main variations between labeled doses and the precise quantity of melatonin contained in some merchandise. In some circumstances, dietary supplements contained a number of occasions the acknowledged dose or surprising compounds equivalent to serotonin.

Knowledge from pediatric poison management facilities additionally present a pointy improve in unintentional melatonin ingestions amongst kids. Younger kids seem significantly susceptible, typically resulting from gummy formulations that resemble sweet and improper storage at house. These findings counsel that the dangers related to actual world melatonin use could also be larger than beforehand assumed.

Consultants Urge Cautious and Restricted Use

Researchers warning that melatonin shouldn’t be handled as a fast repair for childhood sleep issues. Though it may be helpful in sure fastidiously chosen conditions, significantly when guided by a healthcare skilled, it mustn’t substitute thorough sleep assessments or behavioral interventions.

The evaluation stresses that each clinicians and caregivers ought to view melatonin as a biologically lively hormone fairly than a innocent complement. With out stronger proof and higher regulation, routine or unsupervised use may expose kids to pointless dangers whereas drawing consideration away from confirmed non pharmacological methods that assist wholesome sleep.

Behavioral Sleep Methods Stay First Line Therapy

The findings have vital implications for pediatric medication, public well being coverage, and caregiver training. Behavioral approaches to sleep ought to stay the first therapy for childhood insomnia. These methods embody sustaining constant bedtime routines, limiting display publicity earlier than mattress, and setting age acceptable sleep expectations.

If melatonin is used, the evaluation recommends beginning with the bottom efficient dose, limiting the length of therapy, and utilizing it solely below medical supervision. Researchers additionally emphasize the necessity for stronger oversight of melatonin merchandise designed for youngsters, clearer labeling requirements, and extra long run scientific analysis. These steps may assist make sure that kids obtain secure, efficient, and proof primarily based assist for wholesome sleep.

3 Questions: On the way forward for AI and the mathematical and bodily sciences | MIT Information

0

Curiosity-driven analysis has lengthy sparked technological transformations. A century in the past, curiosity about atoms led to quantum mechanics, and finally the transistor on the coronary heart of contemporary computing. Conversely, the steam engine was a sensible breakthrough, nevertheless it took elementary analysis in thermodynamics to completely harness its energy. 

At present, synthetic intelligence and science discover themselves at an analogous inflection level. The present AI revolution has been fueled by a long time of analysis within the mathematical and bodily sciences (MPS), which offered the difficult issues, datasets, and insights that made trendy AI potential. The 2024 Nobel Prizes in physics and chemistry, recognizing foundational AI strategies rooted in physics and AI purposes for protein design, made this connection not possible to overlook.

In 2025, MIT hosted a Workshop on the Way forward for AI+MPS, funded by the Nationwide Science Basis with assist from the MIT Faculty of Science and the MIT departments of Physics, Chemistry, and Arithmetic. The workshop introduced collectively main AI and science researchers to chart how the MPS domains can finest capitalize on — and contribute to — the way forward for AI. Now a white paper, with suggestions for funding companies, establishments, and researchers, has been revealed in Machine Studying: Science and Know-how. On this interview, Jesse Thaler, MIT professor of physics and chair of the workshop, describes key themes and the way MIT is positioning itself to guide in AI and science.

Q: What are the report’s key themes relating to final 12 months’s gathering of leaders throughout the mathematical and bodily sciences?

A: Gathering so many researchers on the forefront of AI and science in a single room was illuminating. Although the workshop contributors got here from 5 distinct scientific communities — astronomy, chemistry, supplies science, arithmetic, and physics — we discovered many similarities in how we’re every partaking with AI. An actual consensus emerged from our animated discussions: Coordinated funding in computing and information infrastructures, cross-disciplinary analysis strategies, and rigorous coaching can meaningfully advance each AI and science.

One of many central insights was that this must be a two-way road. It’s not nearly utilizing AI to do higher science; science may also make AI higher. Scientists excel at distilling insights from complicated programs, together with neural networks, by uncovering underlying rules and emergent behaviors. We name this the “science of AI,” and it is available in three flavors: science driving AI, the place scientific reasoning informs foundational AI approaches; science inspiring AI, the place scientific challenges push the event of latest algorithms; and science explaining AI, the place scientific instruments assist illuminate how machine intelligence truly works.

In my very own subject of particle physics, as an illustration, researchers are growing real-time AI algorithms to deal with the information deluge from collider experiments. This work has direct implications for locating new physics, however the algorithms themselves develop into priceless properly past our subject. The workshop made clear that the science of AI must be a group precedence — it has the potential to rework how we perceive, develop, and management AI programs.

In fact, bridging science and AI requires individuals who can work throughout each worlds. Attendees persistently emphasised the necessity for “centaur scientists” — researchers with real interdisciplinary experience. Supporting these polymaths at each profession stage, from built-in undergraduate programs to interdisciplinary PhD applications to joint college hires, emerged as important.

Q: How do MIT’s AI and science efforts align with the workshop suggestions?

A: The workshop framed its suggestions round three pillars: analysis, expertise, and group. As director of the NSF Institute for Synthetic Intelligence and Elementary Interactions (IAIFI) — a collaborative AI and physics effort amongst MIT and Harvard, Northeastern, and Tufts universities — I’ve seen firsthand how efficient this framework could be. Scaling this as much as MIT, we will see the place progress is being made and the place alternatives lie.

On the analysis entrance, MIT is already enabling AI-and-science work in each instructions. Even a fast scroll by MIT Information reveals how particular person researchers throughout the Faculty of Science are pursuing AI-driven initiatives, constructing a pipeline of information and surfacing new alternatives. On the similar time, collaborative efforts like IAIFI and the Accelerated AI Algorithms for Information-Pushed Discovery (A3D3) Institute focus interdisciplinary vitality for larger impression. The MIT Generative AI Impression Consortium can also be supporting application-driven AI work on the college scale.

To foster early-career AI-and-science expertise, a number of initiatives are coaching the following era of centaur scientists. The MIT Schwarzman Faculty of Computing’s Widespread Floor for Computing Schooling program helps college students turn out to be “bilingual” in computing and their dwelling self-discipline. Interdisciplinary PhD pathways are additionally gaining traction; IAIFI labored with the MIT Institute for Information, Methods, and Society to create one in physics, statistics, and information science, and about 10 % of physics PhD college students now go for it — a quantity that is more likely to develop. Devoted postdoctoral roles just like the IAIFI Fellowship and Tayebati Fellowship give early-career researchers the liberty to pursue interdisciplinary work. Funding centaur scientists and giving them area to construct connections throughout domains, universities, and profession phases has been transformative.

Lastly, community-building ties all of it collectively. From centered workshops to massive symposia, organizing interdisciplinary occasions alerts that AI and science isn’t siloed work — it’s an rising subject. MIT has the expertise and assets to make a big impression, and internet hosting these gatherings at a number of scales helps set up that management.

Q: What classes can MIT draw about additional advancing its AI-and-science efforts?

A: The workshop crystallized one thing vital: The establishments that lead in AI and science would be the ones that suppose systematically, not piecemeal. Assets are finite, so priorities matter. Workshop attendees have been clear about what turns into potential when an establishment coordinates hires, analysis, and coaching round a cohesive technique.

MIT is properly positioned to construct on what’s already underway with extra structural initiatives — joint college strains throughout computing and scientific domains, expanded interdisciplinary diploma pathways, and deliberate “science of AI” funding. We’re already seeing strikes on this route; this 12 months, the MIT Schwarzman Faculty of Computing and the Division of Physics are conducting their first-ever joint college search, which is thrilling to see.

The virtuous cycle of AI and science has the potential to be really transformative — providing deeper perception into AI, accelerating scientific discovery, and producing sturdy instruments for each. By growing an intentional technique, MIT shall be properly positioned to guide in, and profit from, the approaching waves of AI.