W

When AI Meets Law: Building Trust in the Algorithms that May Shape Justice

by Jed Stiglitz, Richard and Lois Cole Professor of Law, Associate Dean for Academic Affairs, and Director of the Center for Law and AI

ANew Frontier for Law and Technology

In the past few years, lawyers have started experimenting with artificial intelligence in ways that once sounded like science fiction. There are programs that draft contracts, summarize discovery documents, and even pass the bar exam. Law firms experiment with AI-enhanced associates and courts around the world quietly explore tools that could help manage overwhelming dockets. There is even a chance that AI will help close the “justice gap” between the wealthy and less well off. 

The appeal of AI in law is obvious: the law runs on text, and AI runs on text. Millions of judicial opinions, statutes, and briefs seem to form the perfect raw material for algorithms designed to read, reason, and write. The bright vision is that machines could make the legal system faster, cheaper, and—hopefully—more just.

Yet the closer one looks, the more inescapable the perils of AI appear. A model that can generate a plausible-sounding legal brief can also invent citations or subtly misstate precedent. A tool designed to “streamline” judicial review can just as easily undermine it. And unlike many AI applications, the consequences of error in law are measured not in typos or consumer mistakes, but instead in livelihood or liberty. Moreover, like in healthcare, there are severe asymmetries of information in law—clients cannot easily tell good advice from bad advice, good counsel from bad counsel—that is, at least until it is too late.

At Cornell, we began our collaboration—linking the Law School, the Department of Information Science, and the Cornell Tech campus—with a simple question: Why is law such a hard domain for AI?

At Cornell, we began our collaboration—linking the Law School, the Department of Information Science, and the Cornell Tech campus—with a simple question: Why is law such a hard domain for AI? What we found is that law poses not only technical challenges, but also conceptual and ethical ones. Building trustworthy AI for law will depend as much on social insight and professional values as on computational power.

Why Law Resists Automation

In trying to understand law’s resistance to automation, we found ourselves drawn to an older analogy: medicine.

Over sixty years ago, economist Kenneth Arrow asked why the market for healthcare could never behave like the market for ordinary goods. His answer was that medicine involved deep uncertainty, high stakes, and a profound asymmetry of information. Patients rely on doctors not only for treatment but for judgment.

Law is much the same. Both fields require years of specialized training. Both face problems whose diagnoses and solutions are uncertain. Both depend on professional norms—trust, confidentiality, fiduciary duty—that go beyond simple market-based values of efficiency. And in both, the people who rely on expertise often cannot tell when they are being misled.

That asymmetry is precisely what makes AI so risky in law. When a chatbot suggests a restaurant that turns out to be closed, the harm is trivial. When it suggests a course of action that leads someone to waive their legal rights or miss a filing deadline, the harm is lasting and severe. A client or pro se litigant will not know when the answer is wrong.

As with healthcare, the stakes of failure are high, and the room for ambiguity is large. That combination—uncertainty, asymmetry, and consequence—is what makes law different from other arenas where AI has thrived as a consumer-facing tool.

Three Human Tasks Machines Can’t Yet Replace

If AI is to contribute meaningfully to legal work, it must navigate three deeply human tasks: curating data, annotating meaning, and verifying truth. 

1. Data Curation: Teaching the Machine to Read

Before an AI system can reason about the law, it has to read it—and that is harder than it sounds. Legal documents are scattered across jurisdictions, buried behind paywalls, and often trapped in inconsistent digital formats. Some courts post searchable PDFs; others upload scanned photocopies; many others insist you visit them in person to use their photocopier. Even when the technical hurdles are overcome, legal curation raises ethical ones. Client documents are confidential. Proprietary databases guard their collections. For now, data curation remains one of the largest—and least glamorous—bottlenecks in the field.

2. Data Annotation: Teaching the Machine to Think Like a Lawyer

Even when legal texts are available, understanding them is another matter. AI can easily tag words like “plaintiff” or “defendant,” but grasping the reasoning behind a judicial opinion—distinguishing a textual argument from a purposive one, or a majority from a concurrence—demands judgment.

In earlier work, our group has run a series of experiments asking models to classify Supreme Court opinions according to the type of reasoning they employ. General-purpose systems such as GPT-4 can produce eloquent summaries, but when asked to label reasoning styles, they stumble. A smaller, legally trained model fine-tuned on human-coded examples, by contrast, comes much closer to matching expert performance. The key is the human involvement.

That result reveals a broader truth: complex legal understanding cannot simply be “scaled.” It has to be taught, domain by domain, by people who know the terrain. Creating those labeled examples is expensive and slow—it requires trained lawyers, not Amazon Mechanical Turk workers—but it is also clarifying. When experts disagree on how to label an opinion, the disagreement often exposes conceptual ambiguities in the law itself.

In that sense, the very process of teaching machines to recognize legal reasoning may sharpen our own understanding of it. Annotation becomes not just a technical exercise but a form of jurisprudential inquiry. That at least is what we found.

3. Verification: Teaching the Machine to Tell the Truth

The third and hardest task is ensuring that AI outputs are verifiable.

Seasoned lawyers now double-check the citations generated by commercial AI research tools. Studies have found that such systems fabricate cases as often as one-third of the time. These so-called “hallucinations” are not malicious—they are artifacts of models that generate fluent language without genuine understanding—but their consequences in legal settings can be catastrophic.

Verification, in this context, means more than proofreading. It requires testing whether an output is accurate, consistent, and grounded in real sources. In some domains, those tests are straightforward: a speech-to-text system can be scored on how closely its transcript matches the words actually spoken. But other tasks—like summarizing precedent or predicting risk—are far harder to benchmark.

We are considering hybrid auditing frameworks that combine statistical evaluation with human review. The idea is to keep a human “in the loop,” not merely as a backstop for error, but as an active participant in deciding when a system’s performance is good enough for the task at hand.

That threshold varies. A transcription system that is 95 percent accurate may be acceptable in a classroom but not in a courtroom. A model that drafts first-pass contracts may save time for lawyers, yet it must be audited for hidden bias in the clauses it reproduces. Trustworthy AI, like trustworthy law, is contextual: it depends on who uses it, for what purpose, and under what safeguards.

Who Should Use Legal AI?

Our analysis suggests a simple matrix. Picture three users—the public, lawyers, and judges—and two types of activities—tasks and judgments.

For ordinary citizens, AI may help with low-stakes tasks like finding forms, but the risks multiply as the tools venture into advice or prediction. The average person cannot easily tell when the algorithm is wrong, yet the veneer of authority can make errors dangerously persuasive.

Judges face a different dilemma. They are trained to evaluate arguments, yet if they rely on AI to perform the research or synthesis that once shaped their deliberation, the quality of judgment itself could erode. Judicial reasoning is not just about the result; it is about the process of thinking through competing principles. An AI that shortcuts that process might save time but weaken the fairness of legal outcomes and ultimately judicial legitimacy.

Between these two extremes lies the lawyer—the trained intermediary who understands both the limits of the technology and the norms of the profession. For lawyers, AI can act as an exceptionally fast junior associate: tireless, sometimes clumsy, capable of first drafts but in need of supervision. Used well, such tools could reduce costs and narrow the justice gap that leaves most low-income Americans without adequate legal help. Used poorly, they could multiply errors and obscure responsibility.

The guiding principle, we argue, is accountability. When the stakes involve liberty, livelihood, or trust in institutions, humans—not machines—must remain answerable for the result.

What We Can Learn from Medicine

If the medical field helps us understand what is hard about law, that field may also provide lessons that we can borrow.

Over decades, the medical field developed mechanisms to manage uncertainty: peer review, licensing boards, malpractice law, and ethical codes. These institutions do not eliminate error, but they make it visible and correctable. They create a culture of accountability that balances innovation with caution.

Legal AI will require an analogous infrastructure: transparent benchmarks instead of opaque leaderboards; auditing standards akin to clinical trials; and, above all, a commitment to open data and reproducible research. Closed systems may offer short-term convenience, but open ones build collective confidence.

Our call, then, is not just for better technology but for collaboration across disciplines. Lawyers can help define the norms of fairness and confidentiality that AI must respect. Computer scientists can design models that make their reasoning legible. Ethicists can identify where automation threatens fundamental rights. Only by combining these perspectives can we ensure that AI strengthens, rather than erodes, the rule of law.

The Broader Stakes

The story of legal AI is, in miniature, the story of AI itself. As machines take on tasks that once required professional judgment, professions and society must decide which parts of human expertise are truly indispensable.

For law, that question touches the core of democratic governance. Courts derive legitimacy not from efficiency but from deliberation and perceptions of even-handedness. The law’s authority depends on reasoned explanation, not black-box output. If AI is to aid that process, it must embody the same commitment to accountability and fairness that the legal system demands of people.

In the end, the goal should not be to build a “robot judge” or a “robot lawyer.” It is to design tools that help humans reason better—tools that widen access to justice without compromising its integrity. That, we believe, is both the promise and the moral test of artificial intelligence in law. ν 

Adapted from “Tasks and Roles in Legal AI,” 2025, with Allison Koenecke, David Mimno, and Matthew Wilkens.