People owed by Gymratz

The #1 community for Gun Owners of the Northeast

Member Benefits:

  • No ad networks!
  • Discuss all aspects of firearm ownership
  • Discuss anti-gun legislation
  • Buy, sell, and trade in the classified section
  • Chat with Local gun shops, ranges, trainers & other businesses
  • Discover free outdoor shooting areas
  • View up to date on firearm-related events
  • Share photos & video with other members
  • ...and so much more!
  • MocoJed

    Active Member
    Nov 16, 2015
    474
    Loco Moco
    3% chance the OP comes back in "about two weeks." Last I heard he was checking on a car out in his cul de sac...
     

    Occam

    Not Even ONE Indictment
    MDS Supporter
    Feb 24, 2018
    20,239
    Montgomery County
    How do you leak tens of millions of 4473s?

    The same way Manning exfilled untold hundreds of thousands of sensitive State Department cables and other information.

    Just about guaranteed that ATF doesn't consider those scanned (or OCR'd?) 4473s to be worth a lot of energy to protect from a serious inside actor.
     

    Pinecone

    Ultimate Member
    MDS Supporter
    Feb 4, 2013
    28,175
    If each record is a PDF, it is not very searchable.

    Yes, you can search for a string IN a PDF, but how do you search 54 million individual PDF files?
     

    Occam

    Not Even ONE Indictment
    MDS Supporter
    Feb 24, 2018
    20,239
    Montgomery County
    If each record is a PDF, it is not very searchable.

    Yes, you can search for a string IN a PDF, but how do you search 54 million individual PDF files?

    The same way Google lets you search millions of PDFs. Software is used to grind through the contents and builds a searchable index. Totally routine these days.
     

    ToolAA

    Ultimate Member
    MDS Supporter
    Jun 17, 2016
    10,500
    God's Country
    If each record is a PDF, it is not very searchable.

    Yes, you can search for a string IN a PDF, but how do you search 54 million individual PDF files?


    My friend’s dad owned a company which converted information on paper medical records into searchable databases. He was dealing with 10’s of thousands of documents per job. While 54 million is a lot of documents, I believe the technology exists to scan those in and create a searchable database.
     

    Johnny5k

    Ultimate Member
    Nov 24, 2020
    1,021
    What do you think you are doing when you do the Captcha??

    You are both translating difficult to read items for the scanning computers and training the AI to be able to faster and more accurately identify the text. They are so good at text at this point, now they are mostly images, and they are getting fuzzier and fuzzier, making the AI better and better.
     

    Occam

    Not Even ONE Indictment
    MDS Supporter
    Feb 24, 2018
    20,239
    Montgomery County
    I'm sure a bunch of those PDFs are hand written. How searchable would they be?

    The missus is deep into genealogical research, and has watched the technology over the last several years reach amazing new heights of AI parsing hand written (even archaic script) documents.
     

    Johnny5k

    Ultimate Member
    Nov 24, 2020
    1,021
    The missus is deep into genealogical research, and has watched the technology over the last several years reach amazing new heights of AI parsing hand written (even archaic script) documents.

    Yep, they crowdsource the AI training using Captcha.
     

    Blacksmith101

    Grumpy Old Man
    Jun 22, 2012
    22,163
    The missus is deep into genealogical research, and has watched the technology over the last several years reach amazing new heights of AI parsing hand written (even archaic script) documents.

    For example every US Census from 1940 back to 1790 is available indexed, searchable, with digital links to the original images and is available from multiple sources for free. [The 1940 census is made up of 3.8 million images, scanned from over 4,000 rolls of microfilm.] A great genealogical tool.

    Link to the 1940 census at US Archives, see if you can find your parent or grandparents (the Archives site allows search by location or enumeration district and shows the actual census image other sources also allow search by name and other data) :
    https://1940census.archives.gov/getting-started/

    Census records are also available from:

    Maryland public libraries accessible from home for free with a library card. In the Heritage Quest database on the libraries Sailor digital references system.
    https://www.sailor.lib.md.us/services/databases/ (It is the next to the last in the list searchable by name)

    Family Search with a free login (This is the quickest and most comprehensive searchable by name, relationship, etc. what I use most)
    https://www.familysearch.org/en/

    Ancestry.com with a subscription. (I haven't used this because I am cheap)


    For more information on the various census (what they contain, available sources , etc.):
    https://www.cyndislist.com/us/census/

    The 1950 census will be released to the public 1 April 2022 it will get scanned and indexed as quickly as possible.
    How Long Will It Take for the 1950 Census to Be Indexed?

    The time it will take to index the census depends on how many wonderful volunteers dedicate their time to the effort! To give you some perspective of the scope of the project, approximately 132,164,569 persons were enumerated in the 1940 census. In contrast, the estimated population of the United States in 1950 was a little over 150 million.

    In 2012, FamilySearch began the project to index the 1940 United States census in hopes of indexing the entire census in 6 months. With over 163,000 volunteers and several genealogical organizations contributing their time and efforts, the census was indexed in just 4 months except for Puerto Rico!
     

    Bullfrog

    Ultimate Member
    Oct 8, 2009
    15,160
    Carroll County
    If each record is a PDF, it is not very searchable.

    Yes, you can search for a string IN a PDF, but how do you search 54 million individual PDF files?

    Seriously? If you can search a single document for a string you can search 50 billion, let alone 50 million.

    Its just a matter of scale.
     

    Pinecone

    Ultimate Member
    MDS Supporter
    Feb 4, 2013
    28,175
    My friend’s dad owned a company which converted information on paper medical records into searchable databases. He was dealing with 10’s of thousands of documents per job. While 54 million is a lot of documents, I believe the technology exists to scan those in and create a searchable database.

    A searchable database is a LOT different from several million individual files.
     

    Pinecone

    Ultimate Member
    MDS Supporter
    Feb 4, 2013
    28,175
    Seriously? If you can search a single document for a string you can search 50 billion, let alone 50 million.

    Its just a matter of scale.

    If you have enough time. :)

    The point is, are they just scanning them for storage or at they indexing them.

    IIRC, they are prohibited from having a searchable database of them.
     

    Bullfrog

    Ultimate Member
    Oct 8, 2009
    15,160
    Carroll County
    A searchable database is a LOT different from several million individual files.

    No, its not.

    The only difference is in the code you write to do the searching.

    A million, or 50 million individual files plus a few minutes (or maybe a couple hours at most) of coding IS a searchable database.
     

    Sundazes

    My brain hurts
    MDS Supporter
    Nov 13, 2006
    21,306
    Arkham
    The same way Google lets you search millions of PDFs. Software is used to grind through the contents and builds a searchable index. Totally routine these days.

    This. I know of entities that are indexing 20 TB a day once the docs are OCR'd
     

    Users who are viewing this thread

    Latest posts

    Forum statistics

    Threads
    274,934
    Messages
    7,259,571
    Members
    33,350
    Latest member
    Rotorboater

    Latest threads

    Top Bottom