DeepSeek has unveiled DeepSeek-OCR: Contexts Optical Compression, an open-source model developed by its DeepSeek-AI research team.
The new system introduces a visual-based method to compress long text contexts, improving recognition efficiency while cutting computation costs.
According to the team, DeepSeek-OCR surpasses several mainstream models in benchmark tests with far fewer visual tokens. It can also produce more than 200,000 pages of training data per day on a single A100-40G GPU, supporting both large language and vision-language model development.
The open-source DeepSeek-OCR (optical character recognition) model, available via online developer platforms Hugging Face and GitHub, was the result of an “investigation into the role of vision encoders” to compress text for large language models (LLMs), the Hangzhou-based AI start-up said in a blog post.
By using that approach, LLMs would be able to process a massive amount of text without incurring a proportional increase in computing cost.
“Through DeepSeek-OCR, we demonstrated that vision-text compression can achieve significant token reduction – seven to 20 times – for different historical context stages, offering a promising direction” to address long-context challenges in LLMs, the company said.
According to the company’s blog post, DeepSeek-OCR consisted of two main components: DeepEncoder and DeepSeek3B-MoE-A570M as the decoder.
The former acts as the model’s core engine. It maintains low activation under high-resolution inputs, while achieving strong compression ratios to reduce the number of tokens.
The decoder, a Mixture-of-Experts (MoE) model with 570 million parameters, reconstructs the original text. The MoE architecture divides the model into separate sub-networks, or “experts”, that specialise in a subset of the input data to jointly perform a task.
Apart from handling standard vision tasks such as image captioning and object detection, DeepSeek-OCR can also be used to parse highly structured visual content – including tables, formulas and geometric diagrams – which can benefit its application in the fields of finance and science, according to the company.
Citing benchmark tests, the company said that when the number of text tokens is within ten times the size of visual tokens – meaning a compression ratio below 10× – DeepSeek-OCR achieved 97 per cent decoding accuracy.
Source: TechNode, South China Morning Post
Bd-pratidin English/ ANI