Proper validation can accelerate sequence-based discovery of proteins and protein-coding genes. Databases currently contain a backlog of experimentally unverified gene models and tentative assignments of observed transcripts to coding or noncoding RNA. We present and apply a general principle, founded on base composition and the genetic code and validated here by bulk 2-D gels, that can improve the reliability of such classifications and of the algorithms or pipelines that lead to them.