How do you leak tens of millions of 4473s?
If each record is a PDF, it is not very searchable.
Yes, you can search for a string IN a PDF, but how do you search 54 million individual PDF files?
If each record is a PDF, it is not very searchable.
Yes, you can search for a string IN a PDF, but how do you search 54 million individual PDF files?
I'm sure a bunch of those PDFs are hand written. How searchable would they be?
The missus is deep into genealogical research, and has watched the technology over the last several years reach amazing new heights of AI parsing hand written (even archaic script) documents.
The missus is deep into genealogical research, and has watched the technology over the last several years reach amazing new heights of AI parsing hand written (even archaic script) documents.
How Long Will It Take for the 1950 Census to Be Indexed?
The time it will take to index the census depends on how many wonderful volunteers dedicate their time to the effort! To give you some perspective of the scope of the project, approximately 132,164,569 persons were enumerated in the 1940 census. In contrast, the estimated population of the United States in 1950 was a little over 150 million.
In 2012, FamilySearch began the project to index the 1940 United States census in hopes of indexing the entire census in 6 months. With over 163,000 volunteers and several genealogical organizations contributing their time and efforts, the census was indexed in just 4 months except for Puerto Rico!
If each record is a PDF, it is not very searchable.
Yes, you can search for a string IN a PDF, but how do you search 54 million individual PDF files?
My friend’s dad owned a company which converted information on paper medical records into searchable databases. He was dealing with 10’s of thousands of documents per job. While 54 million is a lot of documents, I believe the technology exists to scan those in and create a searchable database.
Seriously? If you can search a single document for a string you can search 50 billion, let alone 50 million.
Its just a matter of scale.
A searchable database is a LOT different from several million individual files.
A searchable database is a LOT different from several million individual files.
The same way Google lets you search millions of PDFs. Software is used to grind through the contents and builds a searchable index. Totally routine these days.